I created a CSV file using spark as follows:
t1.write.option("sep","\001").mode("overwrite").format("csv").save("s3://test123/testcsv001/")
And then tried a COPY command in Redshift to load the CSV file:
copy schema123.table123 from 's3://test123/testcsv001/' access_key_id 'removed' secret_access_key 'removed' session_token 'removed' TIMEFORMAT 'auto' DATEFORMAT 'auto' DELIMITER '\001' IGNOREHEADER AS 0 TRUNCATECOLUMNS NULL as 'NULL' TRIMBLANKS ACCEPTANYDATE EMPTYASNULL BLANKSASNULL ESCAPE COMPUPDATE OFF STATUPDATE ON ;
The command fails on records where the first column has a null value. The first column in spark has column definition of a LONG. The target column is a BIGINT with no NOT NULL constraint. I cast the column to INT in spark and wrote it to csv and it still failed for the same reason.
Per redshift documentation loading NULLs into BIGINT should work fine.
Any insight into this?