How schedule activity works in Azure Data Factory

Question

I am trying grasp the concept of Data Factory to understand how schedule activity works, but does not really understand much.

Assume I have workflow as below:

I have an agent (built as Windows Service) running on client's machine which is scheduled to extract data from SAP source daily at 1 AM, and then put it on Azure blob storage. Agent just tries to extract only yesterday's data. Example: Agent running at 1 AM today (9 April) only extract whole data on 8 April. This agent is not related to Data Factory.
Assume it takes around 30 minutes for agent to get daily data (8 April) and put it in blob storage, it may be more or less depending on how big data is.
I have a Factory Pipepine (active forever from 2016-04-08T01:30:00Z) which uses blob storage as input dataset and 1 schedule activity to copy data from blob storage to database.

Input dataset has availability option as daily frequency:

"availability": {
  "frequency": "Day",
  "interval": 1
}

Schedule activity is scheduled as daily frequency:

   "scheduler": {
      "frequency": "Day",
      "interval": 1
    }

So, based on the workflow, my questions are:

After 1:30 AM, the agent finish data extraction from SAP and put it into blog storage as input dataset. How the data factory knows the data slice for 8 April is ready for data factory.
What if the data is not ready after 1:30, the activity is still running at this time?

JustLogic JustLogic · Accepted Answer · 2016-04-22T18:48:18

If I understand your particular scenario correctly, and you have access to modify the code of the windows service you can have your windows service kick off the ADF pipeline when it is complete. I am doing something exactly like this and I need to control when my pipeline begins. I have a local job pulling data from a few data sources and putting it into an azure sql db. Once that is complete I need my pipeline to start however there was no way for me to know exactly when my job was going to complete. So the final step of my local job is to kick off my ADF pipeline. I have a write up on how to do it here - Starting an azure data factory pipeline from .net.

Hope this helps.

How schedule activity works in Azure Data Factory

3 Answers