0
votes

Question looks fairly simple but in my case it has some intricacies involved in it. So here is the deal:

  • There is one sql database hosted on-premise and one sql Azure hosted database. We need to keep both databases in sync. These databases contain 50 tables.
  • Azure database will not be updated by any application but on premise database will keep on updating frequently. So we need modified/inserted data from on premise database to be moved to Azure database. We are using Azure Data Factory (ADF) for this.

  • All database tables contain one column called LastModifiedDate indicating when was record modified.

  • Currently we have created staging tables corresponding to all 50 tables. We are maintaining one watermark table which contains table name and it's highest LastModifiedDate.
  • There is one activity in ADF job which executes stored procedure which takes records from all tables having LastModifiedDate > corresponding LastModifiedDate in watermark table and dumps them into staging table.
  • When execution of this stored procedure completes, all data from staging table is synced with Azure database tables. Finally watermark table is updated LastModifiedDate for each table. All staging tables are then flushed.
  • This process will keep on repeating periodically, so that whenever data from on-premise is updated, Azure database will also update.

Problems with current approach:

Creating staging table correspoding to each table doesn't look like a good idea. If number of tables in database increases, we need those many corresponding staging table.

Question:

Is there any better approach to handle this scenario with using ADF and without creating huge number of staging tables?

1
Like, a one-time migration or continuous? - Jacob H
It is continuous migration - Tejas Sutar
Data Factory is not the most suitable tool for your use case. It sounds like transactional replication would be better e.g. with SQL Data Sync. ADF is good for integrations but it is not a db replication tool. If you stick to ADF, you will have lots of boilerplate code you need to maintain. This can be made easier with automating code generation. Also note that with ADF you will never be able to have real time replication to Azure. - Veikko
hi - Could be better off using Azure Functions to do this, calling the same stored proc but this time placing updated data into cosmos db, could then use the change feed processor in cosmos to track changes and then populate/replicate your Azure SQL based on Cosmos db or scrap Azure SQL altogether and just use Cosmos Db as your online source. - Mark West

1 Answers

2
votes

You can try using SQL Data Sync instead and make SQL Data Sync just to sync in one direction, from on-premise to Azure SQL Database. When configuring SQL Data Sync choose "To the hub" on "Sync Directions" as shown on below image.

enter image description here