0
votes

I am trying to use Get Metadata activity in Azure Data Factory in order to get blob filenames and copy them to Azure SQL database table. I follow this tutorial: https://www.mssqltips.com/sqlservertip/6246/azure-data-factory-get-metadata-example/

Here is my pipeline, Copy Data > Source is the source destination of the blob files in my Blob storage. I need to specify my source file as binary because they are *.jpeg files.

GetMetadata0

For my Copy Data > Sink, its the Azure SQL database, I enable the option "Auto Create table"

GetMetadata1

In my Sink dataset config, I had to choose one table because the validation won't pass if I don't select the table in my SQL database even though this table is not related at all to the blob filenames that I want to get.

GetMetadata2

Question 1: Am I supposed to create a new table in SQL DB before to have the columns matching the blob filenames that I want to extract?

Then, I tried to validate the pipeline and I get this error.

Copy_Data_1
Sink must be binary when source is binary dataset.

Getmetadata3

Question 2: How can I resolve this error? I had to select the file type of the source as binary as it's one of the step when creating source dataset. Therefore, when I choose sink dataset that is Azure SQL table, I didn't have to select the type of dataset so it doesn't seem to match.

Thank you very much in advance.

New screenshot of the new pipeline, I can now get itemName of filenames in the json output files.

GetFileName1

Now I add Copy Data activity just after Get_File_Name2 activity and connect them together to try to get the json output files as source dataset.

GetFileName2

However, I need to choose the source dataset location first before specify type as json. But, as far as I understand these output json files are the output from Get_File_Name2 activity and they are not yet stored on Blob storage. How do I make the copy data activity reading these json output file as source dataset?

GetFileName3

Update 10/14/2020 Here is my new activity stored procedure, I added the parameter as suggested however, I changed the name to JsonData as my stored procedure requires this parameter.

storedprocedure1

This is my stored procedure.

storedprocedure2

I get this error at the stored procedure:

{
    "errorCode": "2402",
    "message": "Execution fail against sql server. Sql error number: 13609. Error Message: JSON text is not properly formatted. Unexpected character 'S' is found at position 0.",
    "failureType": "UserError",
    "target": "Stored procedure1",
    "details": []
}

storedprocedure3

But when I check the input, it seems like it already successfully reading the json string itemName.

storedprocedure4

But, when I check output, it's not there.

storedprocedure5

1
The source files are *.jpeg(image) files not data file like csv, and Binary format can not load to Azure SQL database. It's not supported. We can not help you solve it!Leon Yue
Hello @Leon Yue could you please check my steps again? I didn’t try to load jpeg files themselves into SQL table of course that wouldn’t make sense as we don’t keep image files i. Relational DB. I was trying to get the filenames of jpeg files and put them in SQL table. That’s why I tried to use Get Metadata, is this the right way to do?Shuti
Yes, it is. But please don't use binary as the source in copy active. Your whole workflow may be: Get metadata(get all the .jpeg files name)--> foreach(loop the filename)--> if condition( I don't know what it used for)-->true active(load to Azure SQL).Leon Yue
Auto create table doesn't work for filename! It depends on the file data. You need create a table to receive these filenames.Leon Yue
Hello @Leon Yue thank you very much for your suggestion. I also found similar solution so I modified my pipeline like this: Get Metadata 1 with dataset pointing to blob files on blob storage, here I add file list = Child items Then this is connected to ForEach loop with setting @activity('Get_File_Name1').output.childItems and with activity inside of Get Metadata also pointing to blob storage with FileName value as @item().name. After running this pipeline, it seems getting close as in json output files, I get itemName value of filename. Please, see screenshot above.Shuti

1 Answers

1
votes

Actually, you may could using Get metadata output json as the parameter and then call the stored procedure: Get metedata-->Stored Procedure!

You just need focus on the coding of the stored procedure.

Get Metadata output childitems:

{
   "childItems": [
        {
            "name": "DeploymentFiles.zip",
            "type": "File"
        },
        {
            "name": "geodatalake.pdf",
            "type": "File"
        },
        {
            "name": "test2.xlsx",
            "type": "File"
        },
        {
            "name": "word.csv",
            "type": "File"
        }
}

Stored Procedure:

@activity('Get Metadata1').output.childitems

enter image description here

About how to create the stored procedure(get data from json object), you could ref this blog: Retrieve JSON Data from SQL Server using a Stored Procedure.