0
votes

I created a CSV file using spark as follows:

t1.write.option("sep","\001").mode("overwrite").format("csv").save("s3://test123/testcsv001/")

And then tried a COPY command in Redshift to load the CSV file:

copy schema123.table123 from 's3://test123/testcsv001/' access_key_id 'removed' secret_access_key 'removed' session_token 'removed' TIMEFORMAT 'auto' DATEFORMAT 'auto' DELIMITER '\001' IGNOREHEADER AS 0 TRUNCATECOLUMNS NULL as 'NULL' TRIMBLANKS ACCEPTANYDATE EMPTYASNULL BLANKSASNULL ESCAPE COMPUPDATE OFF STATUPDATE ON ;

The command fails on records where the first column has a null value. The first column in spark has column definition of a LONG. The target column is a BIGINT with no NOT NULL constraint. I cast the column to INT in spark and wrote it to csv and it still failed for the same reason.

Per redshift documentation loading NULLs into BIGINT should work fine.

Any insight into this?

1
One sample row, please? - Bhuvanesh
What Spark version are you using?? - Harshit Mahajan

1 Answers

0
votes

You are setting NULL AS 'NULL'. This mean that when you have the string "NULL" in your source file this means that the value is NULL. So when your input file has "" as the input to a bigint what is Redshift suppose to do? You said you will give it "NULL" when the value is NULL.

I expect you want NULL AS '' and you should also set the file type to CSV so that standard CSV rules will apply.