0
votes

The gcloud-sdk command "bq load" can take a local file as input.

From the output of the command, it looks like that file is first being uploaded into google cloud storage somewhere before the bigquery load job is scheduled. Given that the REST api for bigquery schedule-load-job endpoint also takes only "gs://" urls, and that the load-job needs the data to be reachable, I am pretty sure that such an upload to cloud-storage is taking place (though I can't find any documentation that explicitly describes "bq load" with local files.

My question then is: can someone tell me where the local file is temporarily uploaded to? Is it one of the gcloud project cloud-storage buckets, or somewhere else? Is it guaranteed to be deleted after the load-job completes?

I have a requirement for data to be kept only in a specific geographical region, thus the location of the (presumed) temporary storage is significant.

I could upload the data explicitly to storage, then use "bq load" with a reference to the cloud storage, but then need to arrange deletion of the data afterwards which is a minor inconvenience. A dedicated storage with a "lifecycle rule" could at least delete after 1 day, but the "bq load .. localfile" approach is cleaner..

1

1 Answers

0
votes

If you run bq --help you can see how one of the global bq_flags is --location. It is defined as follows:

--location: “Default geographic location to use when creating datasets or determining where jobs should run (Ignored when not applicable.)”

If you run:

bq load --location=eu {your-table} {your-source} 

For a dataset located in EU, then the job should succeed and all jobs related should be run in EU.