0
votes

I'm following a tutorial on Azure Data Factory migration from Azure SQL to Blob through pipelines. While most of the concepts make sense, the 'Copy Data' query is a bit confusing. I have a background in writing Oracle SQL, but Azure SQL on ADF is pretty different and I'm struggling to find specific technical documentation, probably because it's not widely adopted yet.

Pipeline configuration shown below: enter image description here Query is posted below:

SELECT data_source_table.PersonID,data_source_table.Name,data_source_table.Age, 
CT.SYS_CHANGE_VERSION, SYS_CHANGE_OPERATION
FROM data_source_table 
RIGHT OUTER JOIN CHANGETABLE(CHANGES data_source_table, 
@{activity('LookupLastChangeTrackingVersionActivity').output.firstRow.SYS_CHANGE_VERSION}) 
AS CT ON data_source_table.PersonID = CT.PersonID 
WHERE CT.SYS_CHANGE_VERSION <= 
@{activity('LookupCurrentChangeTrackingVersionActivity').output.firstRow.CurrentChangeTrackingVersion}

Output to the sink Blob as a result of the 'Copy Data' query:

2,name2,14,4,U
7,name7,51,3,I
8,name8,31,5,I
9,name9,38,6,I

Couple questions I had:

  1. There's a lot of external referencing from other activities in the 'Copy Data' query like @{activity('...').output.firstRow.CurrentChangeTrackingVersion. Is there a way to know the appropriate syntax to referencing external activities? Can't find any good documentation the syntax, like what .firstRow is or what the changetable output looks like. I can't replicate this query in SSMS, which makes it a bit of a black box for me.
  2. SYS_CHANGE_OPERATION appears in the SELECT with no table name prefix. Is this directly querying from the table in SourceDataset? (It points to data_source_table, which has table tracking enabled) My main confusion stems from how table tracking information is stored in the enabled tables. Is there a way to show all the table's tracked changes in SSMS? I see some documentation on what the return values, but it's hard for me to visualize it without seeing it on the table, so an output query of some return values would be nice.
  3. LookupLastChangeTracking activity queries in all rows from a table (which when I checked, is just one row), but LookupCurrentChangeTracking activity uses a CHANGE_TRACKING function to pull the version of the data sink in table_store_ChangeTracking_version. Why does it use a function when the data sink's version is already recorded in table_store_ChangeTracking_version?

Sorry for the many questions, but I can't find any way to make this learning curve a bit less steep. Any guides or resources would be awesome!

1

1 Answers

0
votes

There is an article to get the same thing done from the UI and it will help you understand it better . https://docs.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-change-tracking-feature-portal .

1 . These are the Lookup activity ,. very straight forward , please read about them here . https://docs.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity

2.SYS_CHANGE_OPERATION is a column on data_source_table and so that should be fine . Regarding the details on the how the change tracking (CT) is stored , I am not sure if all the system table are exposed on Azure SQL , but we did had few table on the on-prem version of the SQL which could be queried if needed . But for this exercise I think that will be an over kill .