1
votes

I am trying to find the best way to copy yesterday's data from DocumentDB to Azure SQL.

I have a working DocumentDB database that is recording data gathered via a web service. I would like to routinely (daily) copy all new records from the DocumentDB to an Azure SQL DB table. In order to do so I have created and successfully executed an Azure Data Factory Pipeline that copies records with a datetime > '2018-01-01', but I've only ever been able to get it to work with an arbitrary date - never getting the date from a variable.

My research on DocumentDB SQL querying shows that it has Mathematical, Type checking, String, Array, and Geospatial functions but no date-time functions equivalent to SQL Server's getdate() function.

I understand that Data Factory Pipelines have some system variables that are accessible, including utcnow(). I cannot figure out, though, how to actually use those by editing the JSON successfully. If I try just including utcnow() within the query I get an error from DocumentDB that "'utcnow' is not a recognized built-in function name".

"query": "SELECT * FROM c where c.StartTimestamp > utcnow()",

If I try instead to build the string within the JSON using utcnow() I can't even save it because of a syntax error:

"query": "SELECT * FROM c where c.StartTimestamp > " + utcnow(),

I am willing to try a different technology than a Data Factory Pipeline, but I have a lot of data in our DocumentDB so I'm not interested in abandoning that, and I have much greater familiarity with SQL programming and need to move the data there for joining and other analysis.

What is the easiest and best way to copy those new entries over every day into the staging table in Azure SQL?

1

1 Answers

1
votes

Are you using ADF V2 or V1?

For ADF V2. I think that you can follow the incremental approach that they recommend, for example you could have a watermark table (it could be in your target Azure SQL database) and two lookups activities, one of the lookups will obtain the previous run watermark value (it could be date, integer, whatever your audit value is) and another lookup activity to obtain the MAX (watermark_value, i.e. date) of your source document and have a CopyActivity that gets all the values where the c.StartTimeStamp<=MaxWatermarkValueFromSource AND c.StartTimeStamp>LastWaterMarkValue.

I followed this example using the Python SDK and worked for me.

https://docs.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-powershell