3
votes

Inside the Data Lake, We have a folder that basically contains the files pushed by external source every day. However, we wanted to only process the latest added file in that folder. Is there any way to achieve that with Azure Data Factory?

2

2 Answers

5
votes

You could set modifiedDatetimeStart and modifiedDatetimeEnd to filter the files in the folder when you use ADLS connector in copy activity.

Maybe it has two situations:

1.The data was pushed by external source in the schedule,you are suppose to know the schedule time to configure.

2.The frequency is random,then maybe you have to log the pushing data time in another residence,then pass the time as parameter into copy activity pipeline before you execute it.


I try to provide a flow for you in ADF pipelines as below:

My sample files in same folder:

enter image description here

Step1,create two variables, maxtime and filename:

maxtime is the critical datetime of specific date, filename is empty string.

enter image description here

Step2, use GetMetadata Activity and ForEach Activity to get the files under folder.

enter image description here

GetMetadata 1 configuration:

enter image description here

ForEach Activity configuration:

enter image description here

Step3: Inside ForEach Activity,use GetMetadata and If-Condition, the structure as below:

enter image description here

GetMetadata 2 configuration:

enter image description here

If-Condition Activity configuration:

enter image description here

Step4: Inside If-Condition True branch,use Set Variable Activity:

enter image description here

Set variable1 configuration:

enter image description here

Set variable2 configuration:

enter image description here

All of above steps aim to finding the latest fileName, the variable fileName is exactly target.


Addition for another new dataset in GetMetadata 2

enter image description here

1
votes

You can make use of the Modified datetime start and Modified datetime end fields as per shown in below screenshot.

The example here shows get files from 24 hours from current datetime.

enter image description here