3
votes

In Snowflake, There is a concept named Snowpipe which will load the data automatically to configured tables from the different data sources.

We are trying to do the normalization while loading into the snowflake via Snowpipe.

Table A: 
Id & EmployerName

Table B:
Id, Employeename & EmployerID

Value in the File

Name, EmployerName
Raj, Google
Kumar, Microsoft

We are unable to populate table A & table B in a same pipe as the pipe has only one copy statement.

Is there any concept like dependent PIPE & other ways to load the lookup table first & load the main table from the sample file?

Note:

  • If we have two pipes we are unable to specify the dependency.
3

3 Answers

2
votes

Snowpipe should be used to load data into tables as soon as the source data is available in the cloud provider's blob storage location. You cannot set up a dependancy between Snowpipes, this would add a delay into the pipeline anyway.

Your best bet is to set up two Snowpipes to load both tables as soon as data arrives in blob storage and then use Snowflake tasks to handle the dependancies and business logic.

1
votes

Just a few ideas:

  1. Set up a Snowpipe to load into a single permanent staging area (PSA) table.

  2. Use hash codes as the surrogate key for the two separated tables (if you have to use surrogate keys, at all). This way you don't have to do lookups for the surrogate key values. Your tables will look like: TableA - EmployerHash, EmployerName; TableB - EmployeeHash, EmployeeName, EmployerHash;

  3. Then create a Task with a stored procedure, that will issue a multi-table insert so that you will load into the two tables at the same time by using the same source query. (https://docs.snowflake.net/manuals/sql-reference/sql/insert-multi-table.html#insert-multi-table)

  4. If your real table structures & processing are more complex then you can try to use the Snowflake Streams & Tasks based on the PSA table. For a detailed example, see here: https://community.snowflake.com/s/article/Building-a-Type-2-Slowly-Changing-Dimension-in-Snowflake-Using-Streams-and-Tasks-Part-1

HTH, Gabor

1
votes

I have an idea to do for multiple table copy:

  • You may create a stored procedure for copy the data from source location to target table by parameter zing the table name.
  • Use task in snowflake schedule your stored procedure in periodic interval.

This will populate your data in you target table in given interval. Using this option the file won't be copied immediately from your source location. Have to check the option on TASK hoe to get notified on each run.