0
votes

Could anyone possibly advise me on how I can check the results of my data load from S3 to Snowflake.

The load process is batch oriented where I drop files to a S3 bucket from where in Snowflake, we ingest the data into permanent tables by querying stages that read from the files.

Once the data from a file is processed the file needs to be moved to an archive folder.

I'm struggling with how I can complete the last step?

I see a few options in mind but not sure how good they are option 1 - an external function in Snowflake that will raise an event in AWS that will trigger a lambda function to move the file. I think it's a bit flaky

option 2 - write load results into an audit table in Snowflake then then from aws poll this table and move all files listed in it as processed to the archive - might work, but a bit old school, also not real time so will require extra querying on Snowflake side to prevent duplicates

option 3 - write a file straight to the stage and archive folders and delete it from Snowflake using delete option of the load command - not ideal and a bit workaround-ish.

Many thanks in advance. I guess no need to tell that I'm a newby for Snowflake :-)

2

2 Answers

0
votes

All of those options are viable, actually. Just depends on your preference. One note, though, on option 2. You do not need to create an audit table. You can get to a load history of a specific table or tables directly in Snowflake. https://docs.snowflake.com/en/sql-reference/functions/copy_history.html

0
votes

This is how I did it in one of my recent migrations. By the way I did not understand the need to write the file in S3. You can write directly to an internal stage which is much more safe and secure. Snowflake automatically encrypts data in S3. If you copy file to S3(I am assuming you are using external stage), you will be responsible to keep it secure. Any way, how I incorporated the archive process in my migration is as below.

I used Python for generating a CSV dump from SQLSERVER using BCP on a linux server. The python framework then splits the file and compresses it using the LINUX SPLIT and GZIP commands. Post that, it does a 'PUT' to the snowflake internal stage. I check the output of the PUT and if it is a success, I move the file to an archive folder on LINUX. We have a batch program which runs weekly to cleanup the archive folder.