0
votes

We would like to do incremental loading of files from our on-premises file server to Azure Data Lake using Azure Data Factory v2.

Files are supposed to store on daily basis in the on-prem fileserver and we will have to run the ADFv2 pipeline on regular intervals during the day and only the new un-processed files from the folder should be captured.

2

2 Answers

0
votes

Our recommendation is to put the set of files for daily ingestion into /YYYY/MM/DD directories. You can refer to this example on how to use system variables (@trigger().scheduledTime) to read files from the corresponding directory:

https://docs.microsoft.com/en-us/azure/data-factory/how-to-read-write-partitioned-data

0
votes

In the source dataset, you can do file filter.You can do that by time for example (calling datetime function in expression language) or something else what will define new file. https://docs.microsoft.com/en-us/azure/data-factory/control-flow-expression-language-functions Then with a scheduled trigger, you can execute pipeline n times during the day.