I am trying grasp the concept of Data Factory to understand how schedule activity works, but does not really understand much.
Assume I have workflow as below:
I have an agent (built as Windows Service) running on client's machine which is scheduled to extract data from SAP source daily at 1 AM, and then put it on Azure blob storage. Agent just tries to extract only yesterday's data. Example: Agent running at 1 AM today (9 April) only extract whole data on 8 April. This agent is not related to Data Factory.
Assume it takes around 30 minutes for agent to get daily data (8 April) and put it in blob storage, it may be more or less depending on how big data is.
I have a Factory Pipepine (active forever from 2016-04-08T01:30:00Z) which uses blob storage as input dataset and 1 schedule activity to copy data from blob storage to database.
Input dataset has availability option as daily frequency:
"availability": {
"frequency": "Day",
"interval": 1
}
Schedule activity is scheduled as daily frequency:
"scheduler": {
"frequency": "Day",
"interval": 1
}
So, based on the workflow, my questions are:
After 1:30 AM, the agent finish data extraction from SAP and put it into blog storage as input dataset. How the data factory knows the data slice for 8 April is ready for data factory.
What if the data is not ready after 1:30, the activity is still running at this time?