Wanted to check on the similar error that is also mentioned in the post "https://stackguides.com/questions/37298504/google-dataflow-job-and-bigquery-failing-on-different-regions?rq=1"
I am facing a similar issue in my data flow job where I am getting error as below
2021-03-10T06:02:26.115216545ZWorkflow failed. Causes: S01:Read File from GCS/Read+String To BigQuery Row+Write to BigQuery/NativeWrite failed., BigQuery import job "dataflow_job_15712075439082970546-B" failed., BigQuery job "dataflow_job_15712075439082970546-B" in project "whr-asia-datalake-prod" finished with error(s): errorResult: Cannot read and write in different locations: source: US, destination: asia-south1, error: Cannot read and write in different locations: source: US, destination: asia-south1
This is the error when I try to run the code using a cloud function trigger. Please find below the cloud function code. both my source data and target big query dataset resides in asia south 1
"""
Google cloud funtion used for executing dataflow jobs.
"""
from googleapiclient.discovery import build
import time
def df_load_function(file, context):
filesnames = [
'5667788_OPTOUT_',
'WHR_AD_EMAIL_CNSNT_RESP_'
]
# Check the uploaded file and run related dataflow jobs.
for i in filesnames:
if 'inbound/{}'.format(i) in file['name']:
print("Processing file: {filename}".format(filename=file['name']))
project = '<my project>'
inputfile = 'gs://<my bucket>/inbound/' + file['name']
job = 'df_load_wave1_{}'.format(i)
template = 'gs://<my bucket>/template/df_load_wave1_{}'.format(i)
location = 'us-central1'
dataflow = build('dataflow', 'v1b3', cache_discovery=False)
request = dataflow.projects().locations().templates().launch(
projectId=project,
gcsPath=template,
location=location,
body={
'jobName': job,
"environment": {
"workerZone": "us-central1-a"
}
}
)
# Execute the dataflowjob
response = request.execute()
job_id = response["job"]["id"]
I have kept both location and workerzone as us-central1 and us-central1-a respectively. I need to run my data flow job in us central 1 due to some resource issues but read and write data from asia-south1. what else do I need to add in cloud function so that region and zone are both us-central1 but data is read and written from asia south 1.
However, when I run my job manually using cloud shell using below commands it works fine and data is loaded. here both region and zone is us-central1
python -m <python script where the data is read from bucket and load big query> \
--project <my_project> \
--region us-central1 \
--runner DataflowRunner \
--staging_location gs://<bucket_name>/staging \
--temp_location gs://<bucket_name>/temp \
--subnetwork https://www.googleapis.com/compute/v1/projects/whr-ios-network/regions/us-central1/subnetworks/<subnetwork> \
--network projects/whr-ios-network/global/networks/<name> \
--zone us-central1-a \
--save_main_session
Please help anyone. Have been struggling with this issue.