2
votes

We are using apache beam through airflow. Default GCS account is set with environmental variable - GOOGLE_APPLICATION_CREDENTIALS. We don't want to change environmental variable as it might affect other processes running at that time. I couldn't find a way to change Google Cloud Dataflow Service Account programmatically. We are creating pipeline in following way p = beam.Pipeline(argv=self.conf)

Is there any option through argv or options, where in I can mention the location of gcs credential file? Searched through documentation, but didn't find much information.

1

1 Answers

4
votes

You can specify a service account when you launch the job with a basic flag: --serviceAccount=my-service-account-name@my-project.iam.gserviceaccount.com

That account will need the Dataflow Worker role attached plus whatever else you would like(GCS/BQ/Etc). Details here. You don't need the SA to be stored in GCS, or keys locally to use it.