Load Parquet Files from AWS Glue To Redshift

Question

Have an AWS Glue crawler which is creating a data catalog with all the tables from an S3 directory that contains parquet files.

I need to copy the contents of these files/ tables to the Redshift table. I have a few tables where the Parquet file data size cannot be supported by Redshift. VARCHAR(6635) is not sufficient.

In the ideal scenario, would like to truncate these tables.

How do I use the COPY command to load this data into Redshift? If I use spectrum, I can only user INSERT INTO from the external table to Redshift table, which I understand is slower than a bulk copy?

Karan Hebbar Karan Hebbar · Accepted Answer · 2020-09-16T16:51:47

You can use string instead of varchar(6635) (Can be edited in the catalog as well ) , if not can you elaborate more on this, Of the files are in parquet then , Most of the Data conversion parameters that copy provides cannot be used like Escape, null as etc ..

https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

Load Parquet Files from AWS Glue To Redshift

1 Answers