Inside the Data Lake, We have a folder that basically contains the files pushed by external source every day. However, we wanted to only process the latest added file in that folder. Is there any way to achieve that with Azure Data Factory?
2 Answers
You could set modifiedDatetimeStart and modifiedDatetimeEnd to filter the files in the folder when you use ADLS connector in copy activity.
Maybe it has two situations:
1.The data was pushed by external source in the schedule,you are suppose to know the schedule time to configure.
2.The frequency is random,then maybe you have to log the pushing data time in another residence,then pass the time as parameter into copy activity pipeline before you execute it.
I try to provide a flow for you in ADF pipelines as below:
My sample files in same folder:
Step1,create two variables, maxtime and filename:
maxtime is the critical datetime of specific date, filename is empty string.
Step2, use GetMetadata Activity and ForEach Activity to get the files under folder.
GetMetadata 1 configuration:
ForEach Activity configuration:
Step3: Inside ForEach Activity,use GetMetadata and If-Condition, the structure as below:
GetMetadata 2 configuration:
If-Condition Activity configuration:
Step4: Inside If-Condition True branch,use Set Variable Activity:
Set variable1 configuration:
Set variable2 configuration:
All of above steps aim to finding the latest fileName, the variable fileName is exactly target.
Addition for another new dataset in GetMetadata 2