Determine record and line delimiters of file in Snowflake internal stage

0

votes

If you don't know WHO or WHAT uploaded files to a named internal stage, how would you figure out what the record and line delimiters of that file are?

snowflake-cloud-data-platform

0

votes

Without knowing more about your specific use case and files, at a glance this doesn't sound possible since I can imagine a document that has commas AND tabs, in which you wouldn't be able to identify which one is supposed to be the delimiter (if either).

I suppose you could pull it in as a string in a "single column csv with no delimiters" to take a look at what's in it, but then automatically detecting delimiters and then translating that into a proper copy into statement would be a challenge.

If you know a subset of possible delimiters then this might be a little more tractable, but even then, an obvious solution isn't jumping out at me.

0

votes

If you have files you don't know WHO or HOW they put into your stage you have other problems...

but if you download snowsql you can use LIST @~;

+-------------------------------------------------------+----------+------------------------------------+-------------------------------+
| name                                                  |     size | md5                                | last_modified                 |
|-------------------------------------------------------+----------+------------------------------------+-------------------------------|
| Bad17Q2LinksWith17Q1Speeds.csv.gz                     |   499888 | 8558ad33a9398f4bd268f03d5ba88851   | Thu, 10 Aug 2017 23:43:09 GMT |
| CustomerRampEdits.csv.gz                              |     7536 | 4f772a0c80466a522db4d7be0e83027a   | Mon, 14 Aug 2017 22:05:55 GMT |
| Rev2_Bad17Q2LinksWith17Q1Speeds_FIXED.csv.gz          |   771232 | cfef66b448e6a6c9cf5ebe780d9781f6   | Tue, 22 Aug 2017 02:43:11 GMT |
| Rev2_RemainingUnknown17Q2noSpeeds_FIXED.csv.gz        |     2736 | 99cc1f1dad01e98e40efab5ec5453a8e   | Tue, 22 Aug 2017 02:41:37 GMT |

and then use the GET command to pull it down like:

simeon#WH_NAME@DB_NAME.SCHEMA_NAME>get @~/CustomerRampEdits.csv.gz file://c:\temp\ ;
CustomerRampEdits.csv.gz(0.01MB): [##########] 100.00% Done (0.402s, 0.02MB/s).
+--------------------------+------+------------+---------+
| file                     | size | status     | message |
|--------------------------+------+------------+---------|
| CustomerRampEdits.csv.gz | 7521 | DOWNLOADED |         |
+--------------------------+------+------------+---------+
1 Row(s) produced. Time Elapsed: 3.047s
simeon#WH_NAME@DB_NAME.SCHEMA_NAME>

then you can look at it.

0

votes

There is a syntax (like $1 and $2) which can help you to see the value. The only thing you must know is that file format (csv or tbs etc) and you must define a file format.

it looks like this

select $1, $2 from @my_s3_location/file.csv
(FILE_FORMAT=>MY_CSV);

The above (File_FORMAT=>MY_CSV) is a little different in syntax compare to copy command, so pay attention.

before that, you can run the list command to see what all files are there and then accordingly you can put the file name.

list @my_s3_location/

If you don't know what is uploaded, then you still use the same query assuming it is comma-separated or tab-separated (if it is a text file) and simply use a file format and use $1 as the first column for 1st line and view the data and alter your file format and explore it further.

hope this will resolve your issue.

0

votes

There is an option to select values from the snowflake internal stages, you may select the records from the stage to learn more about your unknown data

Determine record and line delimiters of file in Snowflake internal stage

4 Answers