0
votes

Say I have this case where

  • I have to run some test with dataflow
  • inside this dataflow job I need to access a gcs bucket and save my output there.
  • I will need to run the dataflow job with my own SA instead of the default SA.

I created a Google Service Account to run my dataflow job. But after I enabled the dataflow API. I end up having 2 SA in front of me.

It got me really confused to see what the official document says

Some Google Cloud services have Google-managed service accounts that allow the services to access your resources. These service accounts are sometimes known as service agents.

If I create a dataflow job to run with the [email protected] SA, I suppose I'd need to grant the roles/storage.objectAdmin for it.

The question is

  • Do I need to grant any permission to the service account agent?
  • What does the service account agent actually do, what does it has to access any resource?
1

1 Answers

2
votes

Several Google Cloud services such as Cloud Dataflow require two sets of permissions.

The program that you write uses a service account. You grant this service account IAM roles to access resources that require authorization that your program requires. For example, reading data from Cloud Storage or issuing queries to BigQuery.

The service agent applies to the service's runtime. For example when you launch a job on Cloud Dataflow, Cloud Dataflow needs to launch VMs to run your program on. Your program is not launching the VMs, the service is. Therefore the service requires its own set of permissions. This is what the service agent is for.

By using two different service accounts, separation of privilege is achieved.