0
votes

I have bunch of parquet files with timestamp fields in different format. I would like to unify the formats while loading the data into redshift table. But below syntax is not working and it throws error TIMEFORMAT argument is not supported for PARQUET based COPY.

Any work arounds ? I can think of only one solution which is unify all timestamp formats while creating parquet file.Since I have to load 2-3 years of parquet files , so this solution is time consuming. Could any one let me know if you have any other solution OR anything missing in below syntax

Syntax :

COPY redshift_table_name
FROM 's3://path/to/files'
IAM_ROLE '*****'
TIMEFORMAT AS 'YYYY-MM-DDTHH:MI:SS'
FORMAT AS PARQUET
1
How about defining a number of Spectrum tables over the parquet files, one for each different date format? You could then query from those Spectrum tables into a single Redshift table, standardising the dates at the same time. You haven't mentioned how many unique date formatss there are but if it's only a handful then this may work for you. - Nathan Griffiths

1 Answers

2
votes

If you can use the Pyarrow library, load the parquet tables and then write them back out in Parquet format using the use_deprecated_int96_timestamps parameter. Redshift will correctly recognize those. I haven't had any luck getting it to properly recognize any other timestamp formats when loading Parquet.