0
votes

I am new to Snowflake data warehouse and want to load data using snowpipe and auto ingest from an external stage (S3 bucket) which contains multiple folders (one for each database) and inside those folders there are multiple files with 1 for each table. The files in the different folders will be updated at different times and I'm currently using auto-ingest to go S3 to SQS to Snowpipe.

My example S3 bucket is setup as follows:

myS3bucket.s3amazonaws.com
    /db1 
       account.csv
       product.csv
    /db2
        invoice.csv
        employees.csv

My current understanding is that I would create one pipe per table so account.csv will relate to 1 pipe which will populate the the account table and then product.csv will relate to a different pipe to populate the product table etc.

At the moment it looks like all the pipes will be triggered if I add a new file to the bucket as the notification is on the bucket.

My questions are:

  • Is that correct that all pipes are triggered when I add a new file? Do I need to setup db1 and db2 as different external stages so if I add a file to db1 folder then only the pipes linked to that external stage will be read?
  • I've read SNS can be used but unsure how it works for snowpipe. Does S3 send a message so it can work out which pipe to call and fans out the message to that pipe?
  • Is there a better approach to this then what I'm using above?

Thanks for any help

2

2 Answers

1
votes

If your stages include the file path then when S3 sends a message that a file has been added, Snowflake will work out which stages are affected, so only the pipes that are pointed at that folder will pull in the file.

if you have three pipes, with three stages, and those stages point at these folders:

  • 1: blahblah
  • 2: blahblah/db1
  • 3: blahblah/db2

then when a file is created in blahblah/db1 it will trigger the pipes 1 and 2, but not 3.

1
votes

When using Auto-Ingest Snowflake creates an SQS per snowflake account, all notifications from S3 goes to that same SQS, and is then passed on to each pipe based on each pipe's stage definition (bucket and path). If the stages have overlaps in the paths, SQS will potentially pass the notification to multiple pipes.

See the "Important" section under "Create a Pipe with Auto-Ingest Enabled" in the "Automating Snowpipe for Amazon S3" documentation.

https://docs.snowflake.com/en/user-guide/data-load-snowpipe-auto-s3.html#step-2-create-a-pipe-with-auto-ingest-enabled