3
votes

Im running bigquery table extract to google clous storage by use extract job.

with compression:'GZIP' option

but it's not work.

it is extracted as plane csv file in gcs. not gzip file.

It works yesterday.

but not works today.

2
Hi. I had a similar issue with my api as you described above, but I was able to find the root of the issue. Moving a BigQuery table to a Google cloud bucket via api (or UI) works fine on the compression part. However, when I download the file in google cloud storage to my server with blob.download_to_file, it uncompresses to zip format instead. Similarly, my ETL script was running fine for over a year until today (always received whatever that was on the bucket instead of uncompressing the files)softdevlife
Hey, I think this should be reported to Google itself cause it’s so weird. When I download the gzip files located in my Google cloud storage bucket on the UI, it uncompresses the files when I am downloading it on my web browser. Other than the file is larger than what I have on my bucket, my uncompressed software doesn’t recognise the file and only after I remove the file format “gz” that I can open it on notepad. Did Google do a core change in downloading files from a Google cloud storage bucket?softdevlife
It appears that the files I generated from BigQuery are now in such a format that Google cloud storage can do decompressive transcoding cloud.google.com/storage/docs/transcodingsoftdevlife

2 Answers

4
votes

As commented, it's due to GCS' decompressive transcoding. I think it's a bug that the BQ compressed export ends up as uncompressed. We'll see if they change it during the day.

Workaround: Reset the header

gsutil setmeta -h "Content-Encoding: "gs://bucket_name/path/*.gz

Public tracker: https://issuetracker.google.com/issues/113252895

3
votes

I experienced the same problem today. It seems Google BigQuery saves the files now by default in the Google cloud bucket in such format that albeit they are compressed in your bucket, allows from their files metadata for Google cloud storage to uncompress them when downloading them ( also called as decompressive transcoding). I found a solution to my problem, not from BigQuery api, but cloud storage api.

Before I run:

blob.download_to_file(file name)

I use:

blob.cache_control = ‘no-transform’

That seems to fix my problem. By the way, the solution above is for Google-Cloud-Python. Your tools may be different, but I hope I helped someone. Other tools may have similar solutions to the tool I am using, so this could help.