1
votes

We are running an open source product called Airflow (https://airflow.apache.org/) on GKE. There are processes running in that pod that need to interact with GCP's Dataproc service in order to create Dataproc clusters. We are using Workload Identity to run our GKE applications.

The Kubernetes Service Account (KSA) has been granted roles/iam.workloadIdentityUser on the Google Service Account (GSA) resource [email protected]:

$ gcloud iam service-accounts get-iam-policy \
> [email protected] \
> --format=json \
> --format="table(bindings.role, bindings.members)" \
> --flatten="bindings[].members"
ROLE                            MEMBERS
roles/iam.workloadIdentityUser  serviceAccount:mygcpproject.svc.id.goog[static11-dsp-dp-airflow/dp-airflow]

And the GSA has been granted a custom role that we've created called dsp_service_account_dataproc_v1:

$ gcloud projects get-iam-policy mygcpproject --format=json | grep dsp_service_account_dataproc_v1 -B 8 -A 1
    {
      "members": [
        "serviceAccount:[email protected]"
      ],
      "role": "projects/mygcpproject/roles/dsp_service_account_dataproc_v1"
    },

That custom role has got all the permissions required to call the dataproc API:

$ gcloud iam roles describe dsp_service_account_dataproc_v1 --project mygcpproject
etag: BwWuudZzoGI=
includedPermissions:
- dataproc.agents.create
- dataproc.agents.delete
- dataproc.agents.get
- dataproc.agents.list
- dataproc.agents.update
- dataproc.clusters.create
- dataproc.clusters.delete
- dataproc.clusters.get
- dataproc.clusters.list
- dataproc.clusters.update
- dataproc.clusters.use
- dataproc.jobs.cancel
- dataproc.jobs.create
- dataproc.jobs.delete
- dataproc.jobs.get
- dataproc.jobs.list
- dataproc.jobs.update
- dataproc.operations.delete
- dataproc.operations.get
- dataproc.operations.list
- dataproc.tasks.lease
- dataproc.tasks.listInvalidatedLeases
- dataproc.tasks.reportStatus

Yet when I try and create a dataproc cluster by calling the dataproc API from Airflow it fails with:

<HttpError 400 when requesting https://dataproc.googleapis.com/v1beta2/projects/mygcpproject/regions/europe-west1/clusters?alt=json returned "User not authorized to act as service account '[email protected]'. To act as a service account, user must have one of [Owner, Editor, Service Account Actor] roles. See https://cloud.google.com/iam/docs/understanding-service-accounts for additional details.">

I assume I'm missing something somewhere, probably some additional role that needs to be granted but I don't know where. Any suggestions would be much appreciated.

1

1 Answers

2
votes

In this case it appears the Workload Identity side of your setup was fine, but it appears you're also trying to make the Dataproc cluster itself run as the caller service account itself (equivalent to calling gcloud dataproc clusters create --service-account [email protected] when the caller itself is also [email protected]).

The error message might actually be a bit outdated too, as it should be pointing at Service Account User instead of Service Account Actor though technically the latter also works.

In general, the identity calling the Dataproc API must have the Service Account User role either at the project-level or directly on the service-account that will be bound to the VMs of a Dataproc cluster, even if that service-account to be bound is the same identity as the caller. This is also true of plain GCE VMs without dataproc - for example, if you gcloud compute instances create --impersonate-service-account [email protected] --service-account [email protected] my-instance that should also fail with the same permission error.

So basically you just need to grant that service account Service Account User on "itself".

As described on https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/service-accounts, this Service Account User role is needed in addition to the Dataproc Editor role.