I'm setting up a pipeline in an Azure "Data Factory", for the purpose of taking flat files from storage and loading them into tables within an Azure SQL DB.
The template for this pipeline specifies that I need a start and end time, which the tutorial says to set to 1 day.
I'm trying to understand this. If it were a CRON job in Linux or scheduled task in Windows Server, then I'd simply tell it when to start (i.e. daily at 6am) and it would take however long it takes to complete.
This leads me to several related questions:
- Why would I need to specify an end time?
- What if I don't know how long it will take to run?
- If I set it too far in the future, do I run the risk of the data pipeline not completing in a timely manner?
- If I set it too soon, will the pipeline break?
- Why is it hard coded as a date instead of a frequency (i.e. it says to use this format -- "2014-10-14T16:32:41Z")
I found a prior question which sheds a little light on how to do frequency instead of hard coded dates, but my questions above are still unanswered by the solution.