0
votes

I am facing issue while I am trying to do a copy into from S3 stage (with column headers but in different order than that of target snowflake table) to snowflake db using this copy into command

COPY INTO db.schema.table FROM @stage/file.csv 
FILE_FORMAT=(TYPE= CSV field_delimiter ='|' SKIP_HEADER =1) ;

Is there any way that we can push the files in the similar order as of target table in snowflake?

I tried using SELECT as well in the COPY command, but no luck, as we ave to hard code the order. This is tedious and is not what Ia m looking for as I need to write for many tables.

I see there is MATCH_BY_COLUMN_NAME = CASE_SENSITIVE | CASE_INSENSITIVE | NONE copy option, but it is not supporting for CSV file or structured data

2

2 Answers

0
votes

You have to either do it yourself, or grab a tool that is capable of doing the work for you. Snowflake is not capable of building this out for you.

Tools like FiveTran and Matillion are capable enough for this, but they're quite expensive and not worth the costs in many situations in my opinion.

Personally, I would suggest that your best bet is to write a script to generate the tedious COPY INTO statements that you mentioned earlier for you.

0
votes

As you said, the only way to do it natively is something like this:

    COPY INTO db.schema.table
    (
    col1
    , col2
    , col3
    , col4 
    )
  FROM (SELECT t.$4,t.$3,t.$2,t.$1
        FROM @stage/file.csv AS t);

But if you want to have it more automated and match by column name you'd have to write a stored procedure that executes a dynamic copy into statement. Or look at a 3rd party tool. To get started, you could build a stored proc that:

  1. queries the table DDL to get the column info for the table:

    SELECT * FROM db.INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'schema' AND TABLE_NAME = 'table';

  2. iterates on the number of columns from the information_schema and builds a dynamic query on the staged file to get the header column names:

    SELECT t.$1,t.$2,t.$3,t.$4 FROM @stage/file.csv AS t ORDER BY metadata$file_row_number ASC LIMIT 1

  3. Generate a dynamic COPY INTO statement based on the order of the columns returned from the stage query. This would assume the header columns in the file would match the snowflake table column names.

If I have more time tonight/tomorrow I'll put together a working example and update this answer.