1
votes

I'm using Azure Data Factory to copy data from Azure Cosmos DB to Azure Data Lake. My pipeline consists of a copy activity which copies data to the Data lake sink.

This is my query on the source dataset:

select * from c 
where c.data.timestamp >= '@{formatDateTime(addminutes(pipeline().TriggerTime, -15), 'yyyy-MM-ddTHH:mm:ssZ' )}' 
AND c.data.timestamp < '@{formatDateTime(pipeline().TriggerTime, 'yyyy-MM-ddTHH:mm:ssZ' )}'

I'm getting the data for the last 15 minutes before the trigger time.

Now, if there is no data retrieved by the query then the copy activity generates an empty file and stores it in the data lake. I want to prevent that. Is there any way I can achieve this?

3

3 Answers

4
votes

You could use lookup activity and then use an if activity to decide whether you need to run the copy activity.

In the lookup activity, you could set firstRowOnly as true since you only want to check whether there are data.

enter image description here

1
votes

This is an older thread but someone might have a more elegant way to handle the issue above that ADF produces a file even there are 0 records. Here are my concerns with the Lookup approach or having a post-process clean up the empty file.

  1. It's inefficient to query database twice just to check if there are rows the first time.
  2. Using the [IF Condition] componenet is not possible if you are already inside an [if component] or [case] component of ADF. (This is an ADF constraint/shortcoming also).
  3. Cleaning up the empty file is also inefficient, and not an option if you are triggering off the event of the file being created since it causes a false-positive as it is written before you can clean it up.
0
votes

I tried the following and it is working: I'm checking if the lookup entry returns more than 0 rows.

enter image description here