0
votes

I am working on a project which uses Azure data factory. I have a requirement but not sure how to implement that.

Requirement:

Source files are generating in a windows on-premises server(remote server). We need to check the number of files inside source folders if the count is less then system need to wait. If the count is matching then the system should start processing pipeline.

With help of power shell script can I achieve this? If yes how can I mention power shell script in my ADF flow?

If we use run book to write power shell scripts , how to call them in ADF before processing the pipeline?

1

1 Answers

0
votes

There is no way for Data Factory to execute scripts on premises, as this would be a major security issue. However, you can write a script that executes on-premises every minute/hour and schedule it using windows scheduler. This script would create a dummy file (say, "ready.txt") in the folder.

You can then create an ADF pipeline with two sequential activities and three datasets:

D1 -> A1 -> D2 + D3 -> A2

  1. the first activity (A1) will depend on a dataset (D1) that looks (and waits) for that dummy file. This activity will produce a dummy dataset (D2) as output.
  2. the second activity (A2) will depend on the second dummy dataset (D2) as well as the real dataset (D3), which is the folder that contains the files you want to copy.

when your script on-prem creates the "ready.txt" file, this will trigger A1, which will produce the dummy dataset D2, which in turn will trigger A2, which will copy the files from your folder on-prem to wherever you want to put them.

I know it sounds complicated, but it's actually pretty simple. look here under "Run activities in a sequence" to see most of the JSON you need.