Trying to run a data flow job in us central 1 region but source and target is in asia-south1

Question

Wanted to check on the similar error that is also mentioned in the post "https://stackguides.com/questions/37298504/google-dataflow-job-and-bigquery-failing-on-different-regions?rq=1"

I am facing a similar issue in my data flow job where I am getting error as below

2021-03-10T06:02:26.115216545ZWorkflow failed. Causes: S01:Read File from GCS/Read+String To BigQuery Row+Write to BigQuery/NativeWrite failed., BigQuery import job "dataflow_job_15712075439082970546-B" failed., BigQuery job "dataflow_job_15712075439082970546-B" in project "whr-asia-datalake-prod" finished with error(s): errorResult: Cannot read and write in different locations: source: US, destination: asia-south1, error: Cannot read and write in different locations: source: US, destination: asia-south1

This is the error when I try to run the code using a cloud function trigger. Please find below the cloud function code. both my source data and target big query dataset resides in asia south 1

"""
Google cloud funtion used for executing dataflow jobs.
"""

from googleapiclient.discovery import build
import time

def df_load_function(file, context):
    filesnames = [
        '5667788_OPTOUT_',
        'WHR_AD_EMAIL_CNSNT_RESP_'
    ]

    # Check the uploaded file and run related dataflow jobs.
    for i in filesnames:
        if 'inbound/{}'.format(i) in file['name']:
            print("Processing file: {filename}".format(filename=file['name']))

            project = '<my project>'
            inputfile = 'gs://<my bucket>/inbound/' + file['name']
            job = 'df_load_wave1_{}'.format(i)
            template = 'gs://<my bucket>/template/df_load_wave1_{}'.format(i)
            location = 'us-central1'
           
            dataflow = build('dataflow', 'v1b3', cache_discovery=False)
            request = dataflow.projects().locations().templates().launch(
                projectId=project,
                gcsPath=template,
                location=location,
                body={
                    'jobName': job,
                    "environment": {
                    "workerZone": "us-central1-a"
                }
                }
            )

            # Execute the dataflowjob
            response = request.execute()
            
            job_id = response["job"]["id"]

I have kept both location and workerzone as us-central1 and us-central1-a respectively. I need to run my data flow job in us central 1 due to some resource issues but read and write data from asia-south1. what else do I need to add in cloud function so that region and zone are both us-central1 but data is read and written from asia south 1.

However, when I run my job manually using cloud shell using below commands it works fine and data is loaded. here both region and zone is us-central1

python -m <python script where the data is read from bucket and load big query> \

--project <my_project> \

--region us-central1 \

--runner DataflowRunner \

--staging_location gs://<bucket_name>/staging \

--temp_location gs://<bucket_name>/temp \

--subnetwork https://www.googleapis.com/compute/v1/projects/whr-ios-network/regions/us-central1/subnetworks/<subnetwork> \

--network projects/whr-ios-network/global/networks/<name> \

--zone us-central1-a \

--save_main_session

Please help anyone. Have been struggling with this issue.

hello @cahen, I know you had a similar problem. Could you please tell me how you fixed it? — radhika sharma
It is not the best practice to use resources from a different region with your source and destination data. This will slow down your process by a lot. I suggest that you should migrate your data source and destination to another region where there are no resource issues. — Ricco D

radhika sharma radhika sharma · Accepted Answer · 2021-03-15T08:57:44

I was able to fix the below error "2021-03-10T06:02:26.115216545ZWorkflow failed. Causes: S01:Read File from GCS/Read+String To BigQuery Row+Write to BigQuery/NativeWrite failed., BigQuery import job "dataflow_job_15712075439082970546-B" failed., BigQuery job "dataflow_job_15712075439082970546-B" in project "whr-asia-datalake-prod" finished with error(s): errorResult: Cannot read and write in different locations: source: US, destination: asia-south1, error: Cannot read and write in different locations: source: US, destination: asia-south1"

I just changed by cloud function to add the temp location of my asia-south bucket location because though I was providing the tmp location as of asia-south1 while creating the template, bigquery IO in my data flow job was trying to use the temp location of us-central1 and not asia-south1 and hence the above error.

Trying to run a data flow job in us central 1 region but source and target is in asia-south1

1 Answers