0
votes

Scenario: I'm copying data from Azure Table Storage to an Azure SQL DB using an upsert stored procedure like this:

CREATE PROCEDURE [dbo].[upsertCustomer] @customerTransaction dbo.CustomerTransaction READONLY
AS
BEGIN
    MERGE customerTransactionstable WITH (HOLDLOCK) AS target_sqldb
    USING @customerTransaction AS source_tblstg
    ON (target_sqldb.customerReferenceId = source_tblstg.customerReferenceId AND
        target_sqldb.Timestamp = source_tblstg.Timestamp)
    WHEN MATCHED THEN
        UPDATE SET
            AccountId = source_tblstg.AccountId,
            TransactionId = source_tblstg.TransactionId,
            CustomerName = source_tblstg.CustomerName
    WHEN NOT MATCHED THEN
        INSERT (
            AccountId,
            TransactionId,
            CustomerName,
            CustomerReferenceId,
            Timestamp
            )
        VALUES (
            source_tblstg.AccountId,
            source_tblstg.TransactionId,
            source_tblstg.CustomerName,
            source_tblstg.CustomerReferenceId,
            source_tblstg.Timestamp
            );
END

GO

where customerReferenceId & Timestamp constitute the composite key for the CustomerTransactionstable

However, when I update the rows in my source(Azure table) and rerun the Azure data factory, I see this error:

"ErrorCode=FailedDbOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A database operation failed with the following error: 'Violation of PRIMARY KEY constraint 'PK_CustomerTransactionstable'. Cannot insert duplicate key in object 'dbo.CustomerTransactionstable'. The duplicate key value is (Dec 31 1990 12:49AM, ABCDEFGHIGK).\r\nThe statement has been terminated.',Source=.Net SqlClient Data Provider,SqlErrorNumber=2627,Class=14,ErrorCode=-2146232060,State=1,Errors=[{Class=14,Number=2627,State=1,Message=Violation of PRIMARY KEY constraint 'PK_CustomerTransactionstable'"

Now, I have verified that there's only one row in both the source and sink with a matching primary key, the only difference is that some columns in my source row have been updated.

This link in the Azure documentation speaks about the repeatable copying, however I don't want to delete rows for a time range from my destination before inserting any data nor do I have the ability to add a new sliceIdentifierColumn to my existing table or any schema change.

Questions:

  1. Is there something wrong with my upsert logic? If yes, is there a better way to do upsert to Azure SQL DBs?
  2. If I choose to use a SQL cleanup script, is there a way to delete only those rows from my Sink that match my primary key?

Edit:

This has now been resolved.

Solution:

The primary key violation will only occur if it's trying to insert a record which already has a matching primary key. In my case there although there was just one record in the sink, the condition on which the merge was being done wasn't getting satisfied due to mismatch between datetime and datetimeoffset fields.

1

1 Answers

1
votes

Have you tried it using ADF Data Flows with Mapping Data Flows instead of coding it through a stored procedure? It may be much easier for you for upserts with SQL.

With an Alter Row transformation, you can perform Upsert, Update, Delete, Insert via UI settings and picking a PK: https://docs.microsoft.com/en-us/azure/data-factory/data-flow-alter-row

You would still just need a Copy Activity prior to your Data Flow activity to copy the data from Table Storage. Put it in a Blob folder and then Data Flow can read the Source from there.