0
votes

I have two tables in snowflake named table1 and table2. Table1 is the source table which contains incremental data and table2 is the target table.

So my usecase is I have to take data from table1 and update the data into table2 but this process has to be done using Azure Data Factory.

I tried to create a data flow in ADF but it didn't allowed me to connect with the snowflake directly as it is not in the supported sources list. The native snowflake connector only supports the Copy Data Activity. So as a work around I first created a copy activity which copy the data from snowflake to azure blob. Then used the Azure Blob as source for Data Flow to create my scd1 implementation and saved the output in csv files.

Now My question is how should I update the data in target table2. Because If I directly use the copy activity to copy the csv files into snowflake then it will result in the duplicate records at snowflake side. For instance lets say table2 contains a row

id,name,age,data
1234,kristopher,24,somedata

and table1 contains

id,name,age,data
1234,kristopher,24,some-new-data

So now I have table1 data in csv which has to be loaded in snowflake. If I am loading directly then the resultant looks something like this.

id,name,age,data
1234,kristopher,24,somedata
1234,kristopher,24,some-new-data

But I only need

1234,kristopher,24,some-new-data

Let me know if some more explanation is required. I am new to Azure Data Factory and Snowflake as well.

Thanks

2

2 Answers

0
votes

As you have observed, the ADF Data Flows currently don't support Snowflake datasets as a source.

You could theoretically follow this design pattern but it seems like alot of work for the requirement you have described. An alternative would be to go down the Azure Function route, but again I would trade off the requirement vs. effort required.

If it didn't have to be in ADF, then a quick approach would be to use a Snowflake Task to schedule some SQL to manage the SCD behavior for you.

I hope this helps.

Best regards, Dan.

0
votes

you can put your login in a snowflake stored procedure, then execute your stored proc in ADF