1
votes

I'm currently using Data Factory V1.

I have a pipeline with 2 chained activities:

. The first activity is a Copy Activity that extracts a table from SQLDB into a .tsv file in Data Lake Store

. The second activity is a Data Lake Analytics U-SQL activity that collects the data in the previously created .tsv file and adds it to an existing table in Data Lake database.

Obviously, I only want the second activity to run after the first activity so I used the output dataset from the first activity as the input data to the second activity and it works fine.

But, if the first activity fails, the second activity will be stuck at the state "Waiting: Dataset dependencies (The upstream dependencies are not ready)".

I have the policy->timeout property set for the second activity but it only seems to work after this activity has started. So, since the activity never starts, it doesn't timeout and it stays stuck.

How can I set a timeout for this "waiting" period?

Thank you

1

1 Answers

0
votes

That is how v1 works. If your upstream dataset fails, the second will stay in the waiting state until the first dataset has completed successfully.

If you are using a schedule, you would want to fix the problem with the first activity and run the failed slice again. If you're working with a one-time pipeline, you have to run the whole pipeline again after fixing the problem.

The timeout only works when the processing actually starts, as is written in de Data Factory documentation.

If the data processing time on a slice exceeds the timeout value, it is canceled, and the system attempts to retry the processing. The number of retries depends on the retry property. When timeout occurs, the status is set to TimedOut.