0
votes

I've staged some data from S3 into Snowflake, which I want to COPY into a table. However, I want some of the columns in the table to be values from the URL path of the staged data. For example -

The data is stored like this - s3://bucket1/subbucket1/object_ID/instance/type/file.json

I want to store the data in a table that looks like this:

object_ID instance type values from file (JSON)
2222 3333 type1 {JSON}

The only way I've been able to find that helps filter on a COPY INTO command is the PATTERN function, which allows you to copy over only values specified through a regex. Using that function, I've only been able to pull back certain files, but the resulting table has the single value of the full path.

1

1 Answers

0
votes

I assume you are using the METADATA$FILENAME function during your COPY INTO command? Have you tried parsing it directly as part of your SELECT?

Something along the lines of:

COPY INTO ...
SELECT SPLIT_PART(METADATA$FILENAME,'/',5),
       SPLIT_PART(METADATA$FILENAME,'/',6),
       SPLIT_PART(METADATA$FILENAME,'/',7),
       $1
FROM @STAGE;

I haven't tried this to see if it works in a COPY command, but it definitely works when selecting directly against the stage, so I'd imagine it'll work for the COPY, as well. If it doesn't, let me know.