1
votes

I have some data stored in an S3 bucket and I want to load it into one of my Snowflake DBs. Could you help me to better understand the 2 following points please :

  1. From the documentation (https://docs.snowflake.com/en/user-guide/data-load-s3.html), I see it is better to first create an external stage before loading the data with the COPY INTO operation, but it is not mandatory. ==> What is the advantage/usage of creating this external step and what happen under the hood if you do not create it ==> In the COPY INTO doc, it is said that the data must be staged beforehand. If the data is not staged, Snowflake creates a temporary stage ?

  2. If my S3 bucket is not in the same region as my Snowflake DB, is it still possible to load the data directly, or one must first transfert the data to another S3 bucket in the same region as the Snowflake DB ? I expect it is still possible but slower because of network transfert time ?

Thanks in advance

1
1. It is not mandatory but recommended if you want to do regularly bulk loading. See here how to load directly if it's a one time off. 2. Not mandatory either, but can be slower indeed due to network traffic.Sergiu
In general, data must always be staged before Snowflake can load it (you can also load small files through the WebUI). Snowflake automatically creates a "user stage" for every user and a "table stage" for every table; copying data into these stages then allows it to be loaded into a Snowflake table (user stage to any table, table stage only to the associated table). A user can create named stages and these can be internal or external: internal obviously sits within your Snowflake account; external points to a supported Cloud location (S3, Azure, Google Cloud Storage)NickW
@NickW This isn't really true. I believe the question was what the benefit of creating a stage object is. You do not need to define an external stage to load data from an S3 bucket. You can directly reference the S3 bucket and get data loaded without a stage object.Mike Walton
@MikeWalton Thanks for the correction - always useful to know when I am wrong! Your answer is pretty comprehensive (and accurate, compared to mine) - the only other significant advantage of a stage that I can think of is that you can query it directly. I don't believe you could query external files without creating a stage (or an External Table)?NickW

1 Answers

0
votes
  1. The primary advantages of creating an external stage is the ability to tie a file format directly to the stage and not have to worry about defining it on every COPY INTO statement. You can also tie a connection object that contains all of your security information to make that transparent to your users. Lastly, if you have a ton of code that references the stage, but you wind up moving your bucket, you won't need to update any of your code. This is nice for Dev to Prod migrations, as well.

  2. Snowflake can load from any S3 bucket regardless of region. It might be a little bit slower, but not any slower than it'd be for you to copy it to a different bucket and then load to Snowflake. Just be aware that you might incur some egress charges from AWS for moving data across regions.