Scenario: I'm copying data from Azure Table Storage to an Azure SQL DB using an upsert stored procedure like this:
CREATE PROCEDURE [dbo].[upsertCustomer] @customerTransaction dbo.CustomerTransaction READONLY
AS
BEGIN
MERGE customerTransactionstable WITH (HOLDLOCK) AS target_sqldb
USING @customerTransaction AS source_tblstg
ON (target_sqldb.customerReferenceId = source_tblstg.customerReferenceId AND
target_sqldb.Timestamp = source_tblstg.Timestamp)
WHEN MATCHED THEN
UPDATE SET
AccountId = source_tblstg.AccountId,
TransactionId = source_tblstg.TransactionId,
CustomerName = source_tblstg.CustomerName
WHEN NOT MATCHED THEN
INSERT (
AccountId,
TransactionId,
CustomerName,
CustomerReferenceId,
Timestamp
)
VALUES (
source_tblstg.AccountId,
source_tblstg.TransactionId,
source_tblstg.CustomerName,
source_tblstg.CustomerReferenceId,
source_tblstg.Timestamp
);
END
GO
where customerReferenceId & Timestamp constitute the composite key for the CustomerTransactionstable
However, when I update the rows in my source(Azure table) and rerun the Azure data factory, I see this error:
"ErrorCode=FailedDbOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A database operation failed with the following error: 'Violation of PRIMARY KEY constraint 'PK_CustomerTransactionstable'. Cannot insert duplicate key in object 'dbo.CustomerTransactionstable'. The duplicate key value is (Dec 31 1990 12:49AM, ABCDEFGHIGK).\r\nThe statement has been terminated.',Source=.Net SqlClient Data Provider,SqlErrorNumber=2627,Class=14,ErrorCode=-2146232060,State=1,Errors=[{Class=14,Number=2627,State=1,Message=Violation of PRIMARY KEY constraint 'PK_CustomerTransactionstable'"
Now, I have verified that there's only one row in both the source and sink with a matching primary key, the only difference is that some columns in my source row have been updated.
This link in the Azure documentation speaks about the repeatable copying, however I don't want to delete rows for a time range from my destination before inserting any data nor do I have the ability to add a new sliceIdentifierColumn to my existing table or any schema change.
Questions:
- Is there something wrong with my upsert logic? If yes, is there a better way to do upsert to Azure SQL DBs?
- If I choose to use a SQL cleanup script, is there a way to delete only those rows from my Sink that match my primary key?
Edit:
This has now been resolved.
Solution:
The primary key violation will only occur if it's trying to insert a record which already has a matching primary key. In my case there although there was just one record in the sink, the condition on which the merge was being done wasn't getting satisfied due to mismatch between datetime and datetimeoffset fields.