5
votes

I have many TBs in about 1 million tables in a single BigQuery project hosted in multiple datasets that are located in the US. I need to move all of this data to datasets hosted in the EU. What is my best option for doing so?

  • I'd export the tables to Google Cloud Storage and reimport using load jobs, but there's a 10K limit on load jobs per project per day
  • I'd do it as queries w/"allow large results" and save to a destination table, but that doesn't work cross-region

The only option I see right now is to reinsert all of the data using the BQ streaming API, which would be cost prohibitive.

What's the best way to move a large volume of data in many tables cross-region in BigQuery?

3

3 Answers

4
votes

You have a couple of options:

  1. Use load jobs, and contact Google Cloud Support to ask for a quota exception. They're likely to grant 100k or so on a temporary basis (if not, contact me, tigani@google, and I can do so).
  2. Use federated query jobs. That is, move the data into a GCS bucket in the EU, then re-import the data via BigQuery queries with GCS data sources. More info here.

I'll also look into whether we can increase this quota limit across the board.

4
votes

You can copy dataset using BigQuery Copy Dataset (in/cross-region). The copy dataset UI is similar to copy table. Just click "copy dataset" button from the source dataset, and specify the destination dataset in the pop-up form. See screenshot below. Check out the public documentation for more use cases.

enter image description here

0
votes

A few other options that are now available since Jordan answered a few years ago. These options might be useful for some folks:

  1. Use Cloud Composer to orchestrate the export and load via GCS buckets. See here.
  2. Use Cloud Dataflow to orchestrate the export and load via GCS buckets. See here.

Disclaimer: I wrote the article for the 2nd option (using Cloud Dataflow).