I am able to successfully copy data from sql Azure to Azure cosmos DB using data factory. After first copy I want to keep it updated from the same sql source but Here is the problem:
Lets assume I have a SQL table like this:
Col1 Col2
val11 val21
val12 val22
val13 val21
val11 val23
In data factory copy activity I chose SQL as source and cosmos DB as destination. I chose upsert option for copying.
In the mapping section of azure data factory I setup the col1 mapped to /primarykey of my cosmosdb collection and col2 mapped to col2.
Now I executed the data factory pipeline. It successfully copied it and items look like following:
ID PartitionKey Document `SomeGuid1` val11 { Col2: val21 } `SomeGuid2` val12 { Col2: val22 } `SomeGuid3` val13 { Col2: val21 } `SomeGuid4` val11 { Col2: val23 }
- Until this point it is fine. But let's say SQL data has changed and first row has become (val11 val25(Changed)). Now I want to copy this sql table again to cosmos DB.
- When I run the data factory pipeline again then it copies all the rows again and duplicate the data.
ID PartitionKey Document `SomeGuid1` val11 { Col2: val21 } `SomeGuid2` val12 { Col2: val22 } `SomeGuid3` val13 { Col2: val21 } `SomeGuid4` val11 { Col2: val23 } `SomeGuid5` val11 { Col2: val25 } >-- changed value `SomeGuid6` val12 { Col2: val22 } `SomeGuid7` val13 { Col2: val21 } `SomeGuid8` val11 { Col2: val23 }
But I don't want this to be duplicated. I want this to just copy and replace the old data. So I want data after 2nd copy task it to be:
ID PartitionKey Document `SomeGuid5` val11 { Col2: val25 } >-- changed value `SomeGuid6` val12 { Col2: val22 } `SomeGuid7` val13 { Col2: val21 } `SomeGuid8` val11 { Col2: val23 }
I think it is duplicating all the rows because upsert checks the Id property but since source doesn't have Id it generates one, and in 2nd copy also it generates new Ids and adds the new documents.
The way we do it in SQL is we copy in a temp table and then swap the table name with original table after full copy. But cosmos db doesn't support renaming the container: https://feedback.azure.com/forums/263030-azure-cosmos-db/suggestions/19466575-add-ability-to-rename-databases-and-collections
Any help will be highly appreciated. We like cosmos db and we will love to use it.