1
votes

I was trying the Big Query export to Cloud Storage function to save some money. What want is export in AVRO compressed to keep the schema and to be able to import them back again on Big Query if needed. Because Big Query charges data with the size uncompressed and my data is highly redundant, so it should reduce up to 20x compressing it.

On the UI interface there isn't an option to compress when exporting to AVRO. So I assumed that will be by default, but it isn't. It exports AVRO without compression that makes no sense for me because the files will be the same size of the table and will cost the same thing keep it in Big Query and Cloud Storage.

https://cloud.google.com/bigquery/docs/exporting-data

There isn't any information about that.

Anyone knowns if there is another way instead exporting and loading on the cluster to convert to a compressed AVRO and save again no the Cloud Storage?

2

2 Answers

3
votes

So after a lot of research a friend discovery looking the python code for Big Query library that there is some undocumented options for AVRO compression that you can pass to the API: DEFLATE and SNAPPY

After that I also found it at: https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/JobConfigurationExtract.html#setCompression-java.lang.String-

I think is new and isn't document yet and isn't on the WEB interface yet.

I tested it and works! One of my tables the export without compression was an AVRO file of 2.8GB, now with DEFLATE is 170MB.

0
votes

I think is new and isn't document yet

DEFLATE and SNAPPY compressions for AVRO is documented at configuration.extract.compression

Also you can see it in bq command line

bq help extract

isn't on the WEB interface yet

Yes. Compression option for AVRO is not available in BigQuery UI - both Classic and New UIs

Should be available to use in API and bq command line and whatever client libraries have already implemented compression for AVRO