1
votes

I am following the quickstart link to run dataflow job https://cloud.google.com/dataflow/docs/quickstarts/quickstart-java-maven

It works fine when I run mvn command from google cloud shell.

  mvn compile exec:java \
      -Dexec.mainClass=com.example.WordCount \
      -Dexec.args="--project=<my-cloud-project> \
      --stagingLocation=gs://<my-wordcount-storage-bucket>/staging/ \
      --output=gs://<my-wordcount-storage-bucket>/output \
      --runner=DataflowRunner"

But when I try to launch a VM and run the command from it, I get permission denied error. If I give full API access to VM, command runs successfully.

What are permissions I should give to VM to run dataflow job or shall I use service account?

Anybody can tell me best way to run dataflow jobs in production env.

Regards, pari

3
Before I answer your question...why do you want to run it from a VM? - rish0097
I need to run batch job from airflow. So I have to install airflow in one of compute engine. So, vm need to have access to run dataflow job. correct me if I am sounding nonsense - Pari

3 Answers

0
votes

So you will have to give Dataflow Admin rights to the VM to run the Dataflow job. Additionally, if your Dataflow job involves BigQuery then you'll have to provide BigQuery Editor role and so on.

You can also create a service account and provide the required roles to run the job. Hope this helps.

0
votes

To provide granular access, you can take advantage of Dataflow Roles:

  • Developer. Executes and manipulates Dataflow jobs.
  • Viewer. Read-only access to Dataflow related resources
  • Worker. Provides the permissions for a service account to execute work units for a Dataflow pipeline.

When you have an automated app that need to execute Dataflow jobs automatically without user intervention it is recommended to use a service account.

In fact, to deploy it in a production environment the recommendation that I can suggest is creating automated process that deploys and executes your pipeline, you can take advantage of the Cloud Composer that is based in Apache Airflow, and can launch Dataflow jobs (composer 1.0.0 or later in supported Dataflow regions).

0
votes

If you are using airflow, create a service account with access to the components being used in dataflow and create a Connection in the airflow UI with required scopes. Once done, use DataflowJavaOperator/DataflowTemplateOperator to submit the job and orchestrate it via airflow

If you need further help, comment on this answer.