0
votes

I am working on loading my data from s3 to redshift. I noticed a shift in the data type in my query from the redshift error logs.

This is the table I am creating...

main_covid_table_create = ("""
                        CREATE TABLE IF NOT EXISTS main_covid_table(
                        SNo INT IDENTITY(1, 1),
                        ObservationDate DATE,
                        state VARCHAR,
                        country VARCHAR,
                        lastUpdate DATE,
                        Confirmed DOUBLE PRECISION,
                        Deaths DOUBLE PRECISION,
                        Recovered DOUBLE PRECISION
                        )
                        """)

with copy command as

staging_main_covid_table_copy = ("""
                            COPY main_covid_table
                            FROM {}
                            iam_role {}
                            DELIMITER ','
                            IGNOREHEADER 1
                            DATEFORMAT AS 'auto'
                            NULL AS 'NA'
                            """).format(COVID_DATA, IAM_ROLE)

I get his error from redshift after running the script:

enter image description here

My interpretation of this error is that the data type of lastUpdate is been used for the country column. Can anyone help with this?

1
Please provide some sample values for the lastUpdate field from the input file. - John Rotenstein
Thank you @JohnRotenstein. lastUpdate looks like this 2020-01-22 17:00:00, 2021-01-20 05:21:54 - Ajayi Olaniyi
Can you post a few lines of your source data file (including the failing line if possible)? It looks like your table column definitions and the data organization in the input file are not staying aligned. Delimiters in the source data or missing values w/o delimiter could cause this. Seeing what data Redshift is trying to work with will help us better assess what the issue is. - Bill Weiner

1 Answers

0
votes

Presumably, your error output is from STL_LOAD_ERRORS, in which case the third last column is defined as: "The pre-parsing value for the field "colname" that lead to the parsing error.".

Thus, it is saying that there is a problem with country, and that it is trying to interpret it as a date. This does not make sense given the definitions you have provided. In fact, it looks as if it is trying to load the header line as data, which again doesn't make sense given the presence of IGNOREHEADER 1. It also looks like there is a column mis-alignment.

I recommend that you examine the full error details from the STL_LOAD_ERRORS line including the colname and try to figure out what is happening with the data. You could start with just one line of data in the file and see whether it works, then keep adding the data back to find what is breaking the load.