0
votes

We are writing a Dataflow job to write data from JSON in storage bucket to BigQuery dataset. Both storage bucket and BigQuery dataset are in region X. However, dataflow endpoint is not available in region X. The nearest region is Y. So, I have set Dataflow job region as Y, but zone as X. So, all the compute instances are getting spun up in region X. However, still Dataflow job fails with the error:

Cannot read and write in different locations: source: Y, destination: X

Both temp location and staging location storage buckets are set to region x.

The beam version used is 2.17 and SDK is Python SDK.

We are creating a dataflow-template and running it (DataflowRunner). The template is also in region X.

1

1 Answers

0
votes

Staging buckets for Dataflow jobs needs to be multi-regional or at least be in the same region than the BigQery datasets they're writing to. You could work around this problem by hosting the data in the same location as the Dataflow job is running.

Please note that when reading or writing to BigQuery, Google Cloud Storage bucket is used for staging files, based on that, job failed because your Dataflow staging bucket location is in a different region than your BigQuery dataset, and this may have caused the error.

Another thing, make sure taht you have specified all required pipeline options. I hope it helps.