2
votes

Is there a way to copy data from S3 into Snowflake without manually defining the columns beforehand?

I don't want to have to define the schema for the table in Snowflake OR the schema for which columns should be imported from S3. I want it to be schema-on-read, not schema-on-write.

I'm using a storage integration for access to the S3 external stage.

My question is a bit similar to this question, but I don't want to have to define any columns individually. If there's a way to just add on additional columns on the fly, that would solve my issue too.

2
we do something, like this, as we have a "csv" that really is just an one column which is a json blob, and then we load that into a holding table, and pull out the columns we want later. But other than having no columns like the above I am not sure how you can get away with no structure.Simeon Pilgrim

2 Answers

3
votes

We do not currently have schema inference for COPY. I am assuming that you already know about the variant column option for JSON but it will not give you full schematization. https://docs.snowflake.net/manuals/user-guide/semistructured-concepts.html

Dinesh Kulkarni

(PM, Snowflake)

2
votes

You need to use a third party tool that analyses your whole S3 data file in order to build an SQL schema from the data set in the file. Or maybe the tool is given access to the data source definition (which Snowflake hasn't) to make the job for the tool easier.

You might find snippets of Snowflake Stored Procedure code by searching around here at stackoverflow, that outputs schema definitions by eg. recursively flattening JSON data files.

If you want the import to be flexible, you need to use a flexible data format like JSON and a flexible SQL data type like VARIANT. This will work even if your data structures change.

If you want to use rigid formats like CSV or rigid SQL data types (most are rigid) then things get complicated. Rigid data are not flexible, and eg CSV files do not have any embedded type information, making for massive non-future-proof guesswork.
And maybe you are satisfied having all your columns end up as VARCHAR...