2
votes

I am trying to copy data from S3 to amazon redshift by Python script

command = 

    ("COPY {qualname}\n"
                "FROM 's3://{keypath}' \n"
                "CREDENTIALS 'aws_access_key_id={key};aws_secret_access_key={secret}' "
                "{gzip} "
                "{null_as} "
                #"{emptyasnull}"
                "CSV IGNOREHEADER 1;").format(qualname=qualname,
                                 keypath=url,
                                 key=aws_access_key_id,
                                 secret=aws_secret_access_key,
                                 gzip="GZIP " if compress else " ",
                                 null_as="NULL AS '{}'".format(null_as) if null_as is not None else "",
                                 emptyasnull="EMPTYASNULLL " if emptyasnull else " ")```
  • keypath is s3://{bucket - name}/daily-sku-benefits/2018-12-27
  • qualname is the table name

However I am getting an error:

(psycopg2.InternalError) Load into table 'daily_sku_benefits' failed. Check 'stl_load_errors' system table for details. [SQL: "COPY daily_sku_benefits\nFROM 's3://{bucket - name}/daily-sku-benefits/2018-12-27.csv.gzip' \nCREDENTIALS 'aws_access_key_id={access key};aws_secret_access_key={secret-key} GZIP CSV IGNOREHEADER 1;"] (Background on this error at: http://sqlalche.me/e/2j85)

And on checking stl_load_error I am getting this error:

Invalid timestamp format or value [YYYY-MM-DD HH24:MI:SS]  

Does anybody have any idea about it?

1
when you cannot guarantee the integrity of the values, load these timestamps into a varchar first then post process them. - Jon Scott
What was the format of the date it tried to load? - John Rotenstein
envar_date_string = os.environ.get(key='BILL_DATE',default=datetime.today().strftime("%Y-%m-%d")) envar_date = datetime.strptime(envar_date_string, "%Y-%m-%d") billing_date = datetime(envar_date.year, envar_date.month, envar_date.day) - sid297

1 Answers

1
votes

Error clearly shows you the way to resolve the issue, your timestamp data must be of format [YYYY-MM-DD HH24:MI:SS].

Your data should look like below (I have separator as pipe(|), though , is fine too),

 Column1|Columnd2|created_at
 Test1|Test description1|2018-05-10 23:54:51
 Test2|Test description2|2018-05-17 22:15:53

For example table with structure-

create table daily_sku_benefits(
Colunm1 varchar(55),
Colunm2 varchar(255),
updated_at timestamp);

One more trick you could apply, because it might be possible that you complete data might not be bad, it may be some percentage, Hence you could use MAXERROR property in Copy command. Here goes Maxerror explanation.

Hope it helps.