2
votes

I am trying to copy data from csv files found in Azure Datalake to a table in SQL Server. As a condition, I only want to copy files that are not already in the database by using a file name column found in the database table. Below is my attempt (I have not been able to find an example of this on the internet).

I first use a Get Metadata activity that retrieves all the files that are in the data lake. Concurrently, I use a Lookup activity that is getting the distinct file names in the table of choice. First image shows the organization of activities.

enter image description here

Here are the outputs for the Get Metadata and Lookup activities.

Datalake files Metadata Output

{
    "childItems": [
        {
            "name": "surveydetails_eq5d_001.csv",
            "type": "File"
        },
        {
            "name": "surveydetails_koos_001.csv",
            "type": "File"
        },
        {
            "name": "surveydetails_oxford_001.csv",
            "type": "File"
        },
        {
            "name": "surveydetails_womac_001.csv",
            "type": "File"
        }
    ],

Database Table File Lookup

{
    "count": 4,
    "value": [
        {
            "file_name": "surveydetails_koos_001.csv"
        },
        {
            "file_name": "surveydetails_oxford_001.csv"
        },
        {
            "file_name": "surveydetails_eq5d_001.csv"
        },
        {
            "file_name": "surveydetails_womac_001.csv"
        }
    ],

Once I have the file names, I use a ForEach and iterate the Datalake items:

@activity('Datalake Files Metadata').output.childitems

Inside the ForEach I have an IfCondition that checks if the current Datalake file is in the Lookup array. If the Datalake file is not in the array, then a copy activity executes. However, with this construct, I would expect nothing to copy since the items in the datalake are already in the database. I have a feeling that the dynamic content in the IfCondition is off but after searching online for the last few hours I cannot find any obvious errors. Is there logic in the IfCondition off? Is it something else? Please help.

Logic for IfCondition:

@not(contains(activity('Database Table File Lookup').output.value, item().name)) enter image description here

2
Great question! Very much detailed. I have similar problem. Hope you get an answer soon ;) - M. Palmer

2 Answers

1
votes

The Ifcondition logic has item().name which returns a string equal to the file name(Example --> "surveydetails_eq5d_001.csv").

However, activity('Database Table File Lookup').output.value returns an array of dictionaries. Within the dictionaries is "file_name" + the file name. i.e

[
        {
            "file_name": "surveydetails_eq5d_001.csv"
        },
        {
            "file_name": "surveydetails_koos_001.csv"
        },
        {
            "file_name": "surveydetails_oxford_001.csv"
        },
        {
            "file_name": "surveydetails_womac_001.csv"
        }
]

This is why the IF condition is failing.

What you can try is creating 2 new arrays dblist and mytemparray variables. Iterate through the array of the activity('Database Table File Lookup').output.value and extract each file name into your new dblist array(you can use "union" for this). You can then use this array in your Ifcondition check. Do not forget to set "Sequential" switch to ON on the for loop. See the screen shot.enter image description here

Inside my for loop, I have 2 set variable activities.

  1. Activity 1 : Copy dblist to mytemparray
  2. Activity 2 : Combine the mytemparray to file_name and store in dblist @union(variables('mytemparray'),array(item().file_name))
0
votes

Based on some information on Microsoft regarding expressions, and @anupam's comment, I converted the array into a string to do a string vs substring comparison. This simplifies @Anupams answer because another forEach is not required.

@not(contains(string(activity('Survey Details File Lookup').output.value), item().name))