1
votes

I Found that we can use spectrify python module to convert a parquet format but i want to know which command will unload a table to S3 location in parquet format.

one more thing i found that we can load parquet formatted data from s3 to redshift using copy command, https://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html#r_COPY_command_examples-load-listing-from-parquet

can we do the same for unload to s3 from redshift?

2
you cannot do that, would be nice though! - Jon Scott
Thank you,jon scott. Do you have any suggestions to do it without python script? - KarthiK
spectrify is a great tool and makes the process quite painless. converting properly from csv to parquet is non-trivial - Jon Scott
Thank you, jon scott. I will look into that. Can you share basic model how to use spectrify in python unload script - KarthiK
pypi.org/project/spectrify see "Perform all 3 steps in sequence, essentially “copying” a Redshift table Spectrum in one command." - Jon Scott

2 Answers

1
votes

Have you considered AWS Glue? You can create Glue Catalog based on your Redshift Sources and then convert into Parquet. AWS blog for your reference although it talks about converting CSV to Parquet, but you get the idea.

3
votes

There is no need to use AWS Glue or third party Python to unload Redshift data to S3 in Parquet format. The new feature is now supported:

UNLOAD ('select-statement')
TO 's3://object-path/name-prefix'
FORMAT PARQUET

Documentation can be found at UNLOAD - Amazon Redshift