I'm following a tutorial on Azure Data Factory migration from Azure SQL to Blob through pipelines. While most of the concepts make sense, the 'Copy Data' query is a bit confusing. I have a background in writing Oracle SQL, but Azure SQL on ADF is pretty different and I'm struggling to find specific technical documentation, probably because it's not widely adopted yet.
Pipeline configuration shown below:
Query is posted below:
SELECT data_source_table.PersonID,data_source_table.Name,data_source_table.Age,
CT.SYS_CHANGE_VERSION, SYS_CHANGE_OPERATION
FROM data_source_table
RIGHT OUTER JOIN CHANGETABLE(CHANGES data_source_table,
@{activity('LookupLastChangeTrackingVersionActivity').output.firstRow.SYS_CHANGE_VERSION})
AS CT ON data_source_table.PersonID = CT.PersonID
WHERE CT.SYS_CHANGE_VERSION <=
@{activity('LookupCurrentChangeTrackingVersionActivity').output.firstRow.CurrentChangeTrackingVersion}
Output to the sink Blob as a result of the 'Copy Data' query:
2,name2,14,4,U
7,name7,51,3,I
8,name8,31,5,I
9,name9,38,6,I
Couple questions I had:
- There's a lot of external referencing from other activities in the 'Copy Data' query like
@{activity('...').output.firstRow.CurrentChangeTrackingVersion. Is there a way to know the appropriate syntax to referencing external activities? Can't find any good documentation the syntax, like what.firstRowis or what thechangetableoutput looks like. I can't replicate this query in SSMS, which makes it a bit of a black box for me. - SYS_CHANGE_OPERATION appears in the
SELECTwith no table name prefix. Is this directly querying from the table in SourceDataset? (It points to data_source_table, which has table tracking enabled) My main confusion stems from how table tracking information is stored in the enabled tables. Is there a way to show all the table's tracked changes in SSMS? I see some documentation on what the return values, but it's hard for me to visualize it without seeing it on the table, so an output query of some return values would be nice. - LookupLastChangeTracking activity queries in all rows from a table (which when I checked, is just one row), but LookupCurrentChangeTracking activity uses a CHANGE_TRACKING function to pull the version of the data sink in table_store_ChangeTracking_version. Why does it use a function when the data sink's version is already recorded in table_store_ChangeTracking_version?
Sorry for the many questions, but I can't find any way to make this learning curve a bit less steep. Any guides or resources would be awesome!