1
votes

I am trying to use DataflowJavaOperator on our testing composer environment, but I am running into a 403 forbidden error. My intention is to kick off a Dataflow Java job on a different project using the test composer environment.

t2 = DataFlowJavaOperator(
        task_id = "run-java-dataflow-job",
        jar="gs://path/to/dataflow-jar.jar",
        dataflow_default_options=config_params["dataflow_default_options"],
        gcp_conn_id=config_params["gcloud_config"]["conn_id"],
        dag=dag
)

My default options look like

'dataflow_default_options': {
  'project': 'other-project',
  'input': 'other-project:dataset.table',
  'output': 'other-project:dataset.table'
  ...
}

I have tried creating a temporary composer test environment in the same project as the Dataflow, and this allows me to use DataflowJavaOperator as expected. Only when the composer environment resides in a different project as the Dataflow, does DataflowJavaOperator not work as expected.

My current workaround is to use BashOperator, use "env" to set the GOOGLE_APPLICATION_CREDENTIALS as gcp_conn_id path, store the jar file in our test composer bucket, and just run this bash command:

java -jar /path/to/dataflow-jar.jar \ 
[... all Dataflow job options]

Is it possible to use DataflowJavaOperator to kick off Dataflow jobs on another project?

1

1 Answers

1
votes

You need a different GCP connection created for Composer to interact with your 2nd GCP project and you need to pass that connection id to gcp_conn_id in DataFlowJavaOperator