4
votes

Is it possible for BigQuery to return a compressed/gzipped response? I noticed when running a query like:

select * from [dataSet.tableId]

from a table with a large amount of columns (schema), but only 1mb in size, the response is 13mb. It's understandable that BigQuery FLATTENs one of the records, and the response is therefore larger because of the JSON structure. The resulting table in BigQuery however, is 2.04mb in size. I am not using any BigQuery client libraries, but making direct HTTP requests instead. Is there a work around to getting a smaller response with the actual tableData?

Thanks

2

2 Answers

2
votes

You can use the tabledata collection to page through the data in the table. (note that if you're using jobs.getQueryResults(), the same technique works as well.

Also, note that there is really not a good reason to use "select *" queries, since the tabledata.list() operation can let you read the data from a table and a "select *" query costs money, while tabledata.list() does not.

For instance:

PROJECT=my_project
DATASET=my_dataset
TABLE=my_table
BASE_URL=https://www.googleapis.com/bigquery/v2
TABLES_URL=${BASE_URL}/projects/${PROJECT}/datasets/${DATASET}/tables
TABLEDATA_URL=${TABLES_URL}/${TABLE}/data
curl  -H "Authorization: Bearer ${AUTH_TOKEN}" \
    -H "Content-Type: application/json" \
    -X GET \
    "${TABLEDATA_URL}?maxResults=10"

Note that this will return a field called 'pageToken'. You can use this to fetch the next page of results:

PAGE_TOKEN=<page token from response>
curl  -H "Authorization: Bearer ${AUTH_TOKEN}" \
    -H "Content-Type: application/json" \
    -X GET \
    "${TABLEDATA_URL}?maxResults=10&pageToken=${PAGE_TOKEN}"
2
votes

To get compressed output using the raw, HTTP API, you need to specify two headers -- both the User-Agent and Accpet-Encoding headers. The User-Agent must be one that is known to be able to handle gzip.

Accept-Encoding: deflate, gzip
User-Agent: gzip

There is more information in the AppEngine documentation here: http://code.google.com/appengine/kb/general.html#compression. While BigQuery is not AppEngine, it shares many of the network infrastructure pieces from AppEngine, including the determination of when to server compressed content.