2
votes

The concept of Internal Stage is misleading or I am interpreting this incorrectly. Please correct my understanding. According to the documentation

  1. Snowflake account can be hosted in any of three cloud platforms(AWS/Azure/GCP).
  2. So this means that all three layers of Snowflake architecture (Storage, Compute and Cloud Services) are deployed and managed on any of these selected cloud platforms.
  3. And now Snowflake refers to the location of data files in cloud storage as Stage. So referring to point 2, as storage is always deployed on any of the public cloud (AWS/Azure/GCP) Stages theoretically/implicitly refer to these external stages only always right? If this interpretation is correct why is there a concept of Internal Stage in the first place.
3
The main reason for the internal stage is to be able to upload files from connectors (ODBC, JDBC, Python, etc.) directly. There is a PUT command that will work for internal stages, but not external ones. docs.snowflake.com/en/sql-reference/sql/put.html#putGreg Pavlik

3 Answers

2
votes

An external stage is managed by you - the customer - and you can arrange files / secure the files in them however you like. Then, when you want to load data from an external stage into Snowflake you just reference those external stages.

An internal stage is managed by Snowflake and you can PUT files into them, everything else about them is managed by Snowflake. The storage of Snowflake internal stages is abstracted away from you. When I say PUT, this is a command that you can run using the Snowflake CLI that takes a local file and uploads it into an internal stage.

As to why internal stages exist? I suppose something along the lines of:

  • For flexibility, you can use Snowflake's internal blob storage (whatever that may be) or you can use your own storage to stage your data.

  • You can use Snowflake and load data into tables quickly without having blob storage of your own.

  • It makes it easier for non-administrator users. End-users of Snowflake can load data into their own tables without having to know how to use s3/azure blob/GCS etc. Each user gets their own little internal stage area at ~ like a home directory. Also, each table gets their own internal stage that you can put into.

1
votes

Unique to Snowflake is the concept of stage, it is the last place before data is loaded to a target table.

  1. User Stage, @~ (internal) every user has his own and you can load your file into there using a PUT command from your desktop, i.e. on prem to cloud. This location also happens to be the place where your worksheets are stored!
  2. Table stage, @% (internal) every table has one, you can PUT files there ready for loading to tables
  3. Named stage, @ (internal or external), when external the files are landed in S3 buckets in whatever method you used, maybe PUT. When internal they are loaded into Snowflake's own S3 bucket, using PUT. The difference? You manage the content completely externally if the file is there before loading, that includes encryption and compression. Snowflake does all that management if it is internally.

All content hosted as files externally or internally must be copied into a Snowflake table (COPY command) to take advantage of Snowflake's proprietary micropartitions storage mechanism and things like zero copy cloning. Alternatively you could still keep the files in an S3 bucket as an external stage but register the file as an external table to Snowflake and be able to run SQL on it. These are csv, parquet, avro, orc and json. Of course you don't get the benefits as listed above.

Basically, everything is a file before loading to Snowflake tables (which by the way, with improved compression algorithms achieves better compression too)

For your reading: https://docs.snowflake.com/en/user-guide/data-load-overview.html

0
votes

Internal stage is the storage that Snowflake provides and bills back to you. External stage is a reference to storage that is owned and paid for by the customer.

You are correct that this is still a public cloud resource, but internal stage is not accessible by anything other than Snowflake or the Snowflake connectors. Therefore, it is "internal".