I have written a simple Java application to export tables from Google BigQuery using the tabledata: list method (https://cloud.google.com/bigquery/docs/reference/v2/tabledata/list) and using pageToken for paging. No matter what I set the maxResults parameter to, I can only retrieve about 5000 lines per request (depending on row size). As requests take several seconds, this way I can only download 100mb per minute on average.
The ways I have found to speed this up so far:
Batch (not good in my case)
- batch export to Google Cloud Storage
Parallelising requests
- using startIndex
- using dynamic table partitions
It seems the most performant way for my use case is last option, combined with the snapshot decorator to get a stable result in case of changing tables:
myproject:mydataset.mytable@timestamp$0-of-3
myproject:mydataset.mytable@timestamp$1-of-3
myproject:mydataset.mytable@timestamp$2-of-3
So my questions are:
- Is there a better (=faster) approach
- Do the tabledata list request count against the limit of 50 concurrent requests