0
votes

I created a service account and assigned the Dataflow developer role, Compute Viewer role, and the Storage Object Admin role to the temporary bucket. Then, I created another bucket with my admin user account(which has the project owner role) named as gs://outputbucket. Finally, I submit a dataflow job with following cmd:

export GOOGLE_APPLICATION_CREDENRTIALS=<path-to-credential>
TMPBUCKET=temporarybucket
OUTBUCKET=outputbucket
PROJECT=myprojectid

python -m apache_beam.examples.wordcount \
  --input gs://dataflow-samples/shakespeare/kinglear.txt \
  --output gs://$OUTBUCKET/wordcount/outputs \
  --runner DataflowRunner \
  --project $PROJECT \
  --temp_location gs://$TMPBUCKET/tmp/

Similarly, I create a Dataflow job from existing Cloud Pub/Sub to BigQuery template. It can write to any table in the same project without permission. Can anyone explain how could this be possible?

Furthermore, is this a potential security issue according to Google's Principle of Least Privilege?


2
In your code example, are you using your new service account to run your dataflow? - guillaume blaquiere
I set the environmental var GOOGLE_APPLICATIONS_CREDENRTIALS to the path of service account cred json file, I updated it to the command line. - van can

2 Answers

1
votes

The service account that you use with this command

export GOOGLE_APPLICATIONS_CREDENRTIALS=<path-to-credential>

is used for creating the Dataflow pipeline, but it's not used by the pipeline workers.

For this, you have to add the service account email in the option of the pipeline as described here. If not, it's the compute engine service account which is used.

0
votes

It depends what service account you are using to run the job or the User (I guess)

As per the Google document, Dataflow Controller Service Account, By default, workers use your project’s Compute Engine service account as the controller service account.