0
votes

I need to export data from bigquery to google cloud storage on the daily basis. The data volume is rather big(1TB), After I export such data into google storage, I need to download from it, this step is very very slow. So I am wondering if I can export gzipped data into google storage? This can reduce the data volume and then I can download the data very quickly.

Could you give me some advice on this? As I didn't find compressed function in bigquery API when doing extracting from bigquery to google cloud storage.

Thanks in advance!

2

2 Answers

1
votes

Now you can export with gzip compression to GCS.

Plus, if the file is greater than 1GB, you can specify '*' which would split the files into smaller chunks.

1
votes

Unfortunately, there is no gzip option.

That said, you can use automatic HTTP compression to do the gzip for you when you download the files from Google Cloud Storage. Just add the HTTP headers:

accept-encoding: gzip
user-agent: anything

It may seem strange that you need to define a user-agent header. It is strange to us too. It is a feature common across a number of google products, designed to avoid bugs in browsers that don't handle compression correctly (see https://developers.google.com/appengine/kb/general?csw=1#compression).

If you're using gsutil to download the files, it will add the compression headers automatically.