I'm trying to copy data from azure data lake store, perform some processing and move it into a different folder in the same data lake using azure data factory. The source data is organized by year, month and date. I only want to copy the latest file every day and keep replacing it. How can I do this using ADF? I see some answers about using slice start and end dates but that would still mean giving the dates in the JSON file. How do I change those dates automatically?
1 Answers
Assuming you are using ADFv2.
I see some answers about using slice start and end dates but that would still mean giving the dates in the JSON file. How do I change those dates automatically?
Thats the way to go, it is automatic. You do not have to give dates in pipeline. You parameterize the date and attach the pipeline to a Tumbling window trigger and use trigger system variables @trigger().outputs.windowStartTime. So now the trigger will give the dates.
Then you can schedule the trigger every 24 hours so that your tumbling window trigger will pass the @trigger().outputs.windowStartTime e.g. 2019/02/13 (you can format the date as you need based on your datalake structure - format options available in ADF) to the pipeline activity and asks the activity to read from azuredatalake/2019/02/13/file.txt
Follow this doc to get an idea.