0
votes

I cannot figure out why Snowflake is not copying my data as it should, so I decided to come here and ask you.

Here's an explanation of my process. I'm using Talend for data transformation to extract data from some files/databases and import it to Snowflake in the end. Here is an overview of my Talend Job.

Talend Job.

My Talend job works fine and does not throw any error. It should be importing around 9.6k rows to the Snowflake table named Events. Unfortunately, that does not happen. Here's a copy/past of the exact procedure that happens on Snowflake side in the History pane.

 1. SHOW GRANTS TO USER identifier("AMBA");
 2. SELECT * FROM EVENTS;
 3. SELECT CURRENT_DATABASE(), CURRENT_SCHEMA()
 4. alter session /* JDBC:SnowflakeConnectionV1.setAutoCommit*/ set autocommit=false
 5. show /* JDBC:DatabaseMetaData.getPrimaryKeys()
 6. show /* JDBC:DatabaseMetaData.getColumns() 
 7. PUT 'file://C:\\Users\\amba\\AppData\\Local\\Temp\\snowflake\\stage\\EVENTS\\INSERT\\20210826_185008_230_NLF1NE_1\\stream_20210826_185008_2300.gz' '@~/EVENTS/INSERT/20210826_185008_230_NLF1NE_1' parallel=10 overwrite=true auto_compress=false SOURCE_COMPRESSION=gzip
 8. show columns in "EVENTS"
 9. CREATE TEMPORARY TABLE "EVENTS_20210826_185008_230_1" LIKE "EVENTS"
 10. COPY INTO "EVENTS_20210826_185008_230_1" FROM '@~/EVENTS/INSERT/20210826_185008_230_NLF1NE_1' on_error='CONTINUE' file_format=( field_optionally_enclosed_by='"' empty_field_as_null=true)
 11. COPY INTO "EVENTS_20210826_185008_230_1" FROM '@~/EVENTS/INSERT/20210826_185008_230_NLF1NE_1' validation_mode='return_all_errors' file_format=(field_optionally_enclosed_by='"'empty_field_as_null=true) -- HERE I CAN SEE MY 9.6K ROWS
 12. INSERT INTO "EVENTS"("ID","ODB_CREATED_AT","EVENT_TIME","DEVICE_TYPE","EVENT_TYPE","TICKET_TYPE","CARD_NR","COUNT","MANUFACTURER","CARPARK_ID") SELECT * FROM "EVENTS_20210826_185008_230_1"
 13. RM '@~/EVENTS/INSERT/20210826_185008_230_NLF1NE_1'
 14. commit
 15. commit

Here's the history with a screenshot:

History

Here are the copy and insert statements in ascending order, first it throws an error: Inserts and copy statements

1) Load error, all 1 files were rejected: COPY INTO "EVENTS_20210826_185008_230_1" FROM '@~/EVENTS/INSERT/20210826_185008_230_NLF1NE_1' on_error='CONTINUE' file_format=( field_optionally_enclosed_by='"' empty_field_as_null=true)

2) Succeeded: The copy into: COPY INTO "EVENTS_20210826_185008_230_1" FROM '@~/EVENTS/INSERT/20210826_185008_230_NLF1NE_1' validation_mode='return_all_errors' file_format=(field_optionally_enclosed_by='"'empty_field_as_null=true)

3) Succeeded: The insert statement: INSERT INTO "EVENTS"("ID","ODB_CREATED_AT","EVENT_TIME","DEVICE_TYPE","EVENT_TYPE","TICKET_TYPE","CARD_NR","COUNT","MANUFACTURER","CARPARK_ID") SELECT * FROM "EVENTS_20210826_185008_230_1"

I have tried to import the data to postgres databases and simple csv files, works perfectly fine, so the error cannot come from Talend, something has to be wrong in Snowflake.

All snowflake does is:

  • Connects to the database
  • Puts my insert file generated from talend to the server
  • Copies the data to a temp table
  • Copies the data from the temp table to my events table
  • Removes the temp table
  • Commits

What am I missing here?

Thank you

2
Please can you provide the full INSERT INTO EVENTS statement. It might be worth not deleting the temp table (while testing) so you can see if it is being populated correctlyNickW
Hi @NickW, I just edited my post and added the statements after the second screenshot. As I can see, it first throws an error but then copies my 9.6k lines to the temp table. I unfortunately cannot prevent snowflake from deleting the temps table since it's an automatic process... What I can see from the logs is that the temp event table does get filledDjabone

2 Answers

1
votes

Here's the reason and solution.

I tried to import the id as a primary key with Talend to Snowflake. Snowflake does not allow you to utilize and id column insert, since those are generated automatically as we define it while creating the table.

Example:

CREATE TABLE IF NOT EXISTS public.events (
    id bigint NOT NULL autoincrement start 1 increment 1 PRIMARY KEY,
    odb_created_at timestamp without time zone NOT NULL,
    event_time timestamp without time zone NOT NULL,
    device_type integer NOT NULL,
    event_type integer NOT NULL,
    ticket_type integer NOT NULL,
    card_nr character varying(100),
    count integer DEFAULT 1 NOT NULL,
    manufacturer character varying(200),
    carpark_id bigint

I had to remove my ID inserts generated by Talend and let Snowflake do the work. The reason the NOT NULL error code occurred was due to the fact id in SQL was set to NOT NULL(of course because it is an ID). DO NOT USE IDS WITH SNOWFLAKE.

Solution: Use autoincrement in Snowflake directly or use sequences with no autoincrement.

0
votes

are you checking the temp table result?