0
votes

I have written a simple Java application to export tables from Google BigQuery using the tabledata: list method (https://cloud.google.com/bigquery/docs/reference/v2/tabledata/list) and using pageToken for paging. No matter what I set the maxResults parameter to, I can only retrieve about 5000 lines per request (depending on row size). As requests take several seconds, this way I can only download 100mb per minute on average.

The ways I have found to speed this up so far:

Batch (not good in my case)

  • batch export to Google Cloud Storage

Parallelising requests

  • using startIndex
  • using dynamic table partitions

It seems the most performant way for my use case is last option, combined with the snapshot decorator to get a stable result in case of changing tables:

myproject:mydataset.mytable@timestamp$0-of-3
myproject:mydataset.mytable@timestamp$1-of-3
myproject:mydataset.mytable@timestamp$2-of-3

So my questions are:

  1. Is there a better (=faster) approach
  2. Do the tabledata list request count against the limit of 50 concurrent requests
1

1 Answers

0
votes

You can first Export a BigQuery table to Google Cloud Storage using configuration.extract property of Jobs: insert

Then you can download file to location of your interest