Azure Data Factory Copy Data Activity Mapping in Using Triggers

Question

I am creating a pipeline using ADF to copy the data in a XML file to a SQL database. I want this pipeline to be triggered when the XML file is uploaded to Blob Storage. Therefore, here I will be using a parameter with the input Dataset.

Now, in the Copy Data activity that I am using, I want to be able to define the mappings. This is usually quite easy when the path to the file is given, however, in this situation, where a parameter is being used, how can I do this?

From what I have gathered, the mappings can be defined as a JSON schema and assigned to the activity, but is there perhaps an easier way to do this? Maybe by uploading a demo file from which the schema can be imported?

You can actually use the data from trigger in pipeline parameter and dataset parameter using UI. When you say mapping, are you refering to schema mapping inside copy activity ? — KarthikBhyresh-MT

KarthikBhyresh-MT KarthikBhyresh-MT · Accepted Answer · 2021-07-28T12:48:56

When you want to load a xml file into sql DB you are using a Hierarchical source to tabular sink method.

When copying data from hierarchical source to tabular sink, copy activity supports the following capabilities:

Extract data from objects and arrays.
Cross apply multiple objects with the same pattern from an array, in which case to convert one JSON object into multiple records in tabular result.

You can define such mapping on Data Factory authoring UI:

On copy activity -> mapping tab, click Import schemas button to import both source and sink schemas. As Data Factory samples the top few objects when importing schema, if any field doesn't show up, you can add it to the correct layer in the hierarchy - hover on an existing field name and choose to add a node, an object, or an array.
Select the array from which you want to iterate and extract data. It will be auto populated as Collection reference. Note only single array is supported for such operation.
Map the needed fields to sink. Data Factory automatically determines the corresponding JSON paths for the hierarchical side.

Note: For records where the array marked as collection reference is empty and the check box is selected, the entire record is skipped.

Here I am using a sample XML file at source

enter image description here

If you notice here I have used a dataset parameter to which I will be assigning the file name value as obtained from trigger. And now I have placed it in the file name field for file path property in dataset connection.

enter image description here

Next I have created a pipeline parameter to hold the input obtained from trigger before assigning it to the dataset parameter.

enter image description here

Create storage event trigger

enter image description here

Click continue and you fill find a preview of all the files that are applicable for trigger conditions

enter image description here

When you have moved to next slide, if you have created pipeline parameter, which we have, you will see them there

enter image description here

Fill in the value as per your need. See the available system variables here Storage event trigger scope

Now, lets move to copy data activity, here you will find the dataset parameter, assign the pipeline parameter value to it.

enter image description here

Now move to sink tab in copy activity, since you want the source schema to be followed into sink, best way is to select to Auto create a table.

enter image description here

For which you have to make appropriate changes in sink dataset. Now, to configure sink dataset, for table choose edit and manually enter a name for table which does not already exist in your server i.e a new table will be created in this name in the sql server mentioned in sink. Make sure you clear all schema as you will be getting source schema in copy activity.

enter image description here

Back to mapping tab in copy activity, click on import schema and select the fields you want to copy to table. Additionally you can specify the data types and Collection reference is necessary. Refer: Parameterize mapping

You can also switch to Advanced editor, in which case you can directly see and edit the fields' JSON paths. If you choose to add new mapping in this view, specify the JSON path.

So when a file is created in the storage a blob created event is triggered and pipeline runs

enter image description here

You can see the new table "dbo.NewTable" created under ktestsql and it has the data from xml as row. enter image description here

Azure Data Factory Copy Data Activity Mapping in Using Triggers

1 Answers