1
votes

In the snowflake documents about bulk loading from AWS S3, they are saying like :

You can load directly from the bucket, but Snowflake recommends creating an external stage that references the bucket and using the external stage instead.

So my first question is: Why does Snowflake recommend creating an external stage rather than loading it directly from a bucket? Is there a reason for this? Or If you have any documentation explaining why, please let me know. :)

And my second question is: In the architecture diagram of Bulk Loading from a Local File System, there are arrows(➡) from data files to stage, but in the case of Bulk Loading from Amazon S3, there are no arrows from Data Files to external stage. What is the difference between with and without arrows?

Bulk Loading from Amazon S3: https://docs.snowflake.com/en/user-guide/data-load-s3.html

Bulk Loading from a Local File System: https://docs.snowflake.com/en/user-guide/data-load-local-file-system.html

1
Considering that S3 consistency guarantees where improved just recently, direct loading from S3 may now be "more ok" than before.oakad

1 Answers

1
votes

The stage hold all the permissions for the bucket, so and security role can create deal with the AWS tokens, and then grant access to the stage for reads/writes, to other roles, this separates the two tasks of loading data, and securing data.

It also allows the stage to have tokens changed/updated, and code/users using it are not impacted, or even changing to methods where (name escapes me but the) dynamic key exchange happens, so key rotation is all automatic between S3/AWS. Which how we do it, in fact we have many stages, for different sources of data, and the security aspects on business policies do not need to be known handle by the data engineer's who build the ETL code.