How to drop duplicates in source data set (JSON) and load data into azure SQL DB in azure data factory

Question

I have a table in SQL DB with primary key fields. Now i am using a copy activity in azure data factory with source dataset(JSON).

We are writing this data into sink dataset(SQL DB) but the pipeline is failing with the below error

"message": "'Type=System.Data.SqlClient.SqlException,Message=Violation of 
 PRIMARY KEY constraint &apos;PK__field__399771B9251AD6D4&apos;. Cannot 
 insert duplicate key in object &apos;dbo.crop_original_new&apos;. The 
 duplicate key value is (9161&#44; en).\r\nThe statement has been 
 terminated.,Source=.Net SqlClient Data Provider,SqlErrorNumber=2627,Class=14,ErrorCode=-2146232060,State=1,Errors= 
[{Class=14,Number=2627,State=1,Message=Violation of PRIMARY KEY 
constraint &apos;PK__field__399771B9251AD6D4&apos;. Cannot insert 
duplicate key in object &apos;Table&apos;. The duplicate key value is 
(9161&#44; en).,},{Class=0,Number=3621,State=0,Message=The statement has 
been terminated.,},],'",

You need to write to a staging environment and then update your SQL DB from there, checking for duplicates or updates along the way. — iamdave
Can you please show us some of your JSON dataset? I tried in "Copy data", there isn't duplicates in settings.i.stack.imgur.com/IJiY3.png — Leon Yue

DraganB DraganB · Accepted Answer · 2019-08-07T14:30:01

Well, the finest solution would be:

Create a staging table in your SQL environment stg_table (this table should have a different key policy)
Load data from JSON source to stg_table
Write a stored procedure to clean data from duplicates and to load into your destination table

Or if you are familiar with Mapping Data Flows in ADF you can check this article by Mark Kromer

How to drop duplicates in source data set (JSON) and load data into azure SQL DB in azure data factory

2 Answers