1
votes

I have Avro files (compressed using BZIP2) stored in HDFS and S3 and I want to load them into Amazon Redshift. The copy command gives an error:

 error:  Invalid AVRO file
 code:      8001
 context:   Cannot init avro reader from s3 file File header contains an unknown codec 

Does Redshift not support compressed Avro files?

If that's the case, what is the next best option to load this data into Redshift (without converting them back into Avro files without compression).

Can I use sqoop?

1

1 Answers

0
votes

Redshift does support compressed avro files.

To load the data files that are compressed using gzip, lzop, or bzip2, include the corresponding option: GZIP, LZOP, or BZIP2 in the copy command.

Also, you need to mention avro format and provide the json path. Below is the code that i have used and it works.

    copy <tablename> from '<s3 path - abc.avro.gz>'
    credentials 'aws_access_key_id=<access-key>;aws_secret_access_key=<secret access key>'
    format as avro '<json path for avro format>'
    gzip;