1
votes

am trying to create Dataproc cluster with a service account via cloud sdk. It's throwing an error that compute.projects.get is denied. The service account has compute viewer access, compute instance admin, dataproc editor access. Unable to understand why this error. In the IAM policy troubleshooter, I checked dataproc.cluster.create is assigned to the service account

The command is:

gcloud dataproc clusters create cluster-dqm01 \
  --region europe-west-2 \
  --zone europe-west2-b \
  --subnet dataproc-standalone-paasonly-europe-west2 \
  --master-machine-typne n1-standard-4 \
  --master-boot-disk-size 500 \
  --num-workers 2 \
  --worker-machine-type n1-standard-4 \
  --worker-boot-disk-size 500 \
  --image-version 1.3-deb9 \
  --project xxxxxx \
  --service-account xxxx.iam.gserviceaccount.com

ERROR: (gcloud.dataproc.clusters.create) PERMISSION_DENIED: Required 'compute.projects.get' permission for 'projects/xxxxxx'

The project is correct as I have tried to create from the console getting the same error, generated the gcloud command from the console to run with a service account. This is the first time dataproc cluster is being created for the project

1
Are you running the gcloud command from inside a VM or from your own machine? If you type "gcloud auth list" does it show thta you're acting as yourself or acting as a service account? Keep in mind that --service-account in the gcloud dataproc clusters create refers to the service account that the Dataproc cluster itself will behave as when processing data. That's not the same service account that is used to create the VMs in the first place. - Dennis Huo

1 Answers

1
votes

If you had assigned the various permissions to the same service account you're specifying with --service-account, the issue is that you probably meant to specify --impersonate-service-account instead.

There are three identities that are relevant here:

  1. The identity issuing the CreateCluster command - this is often a human identity, but if you're automating things, using --impersonate-service-account, or running the command from inside another GCE VM, it may be a service account itself.
  2. The "Control plane" identity - this is what the Dataproc backend service uses to actually create VMs
  3. The "Data plane" identity - this is what the Dataproc workers behave as when processing data.

Typically, #1 and #2 need the various "compute" permissions and some minimal GCS permissions. #3 typically just needs GCS and optionally BigQuery, CloudSQL, Bigtable, etc. permissions depending on what you're actually processing.

See https://cloud.google.com/dataproc/docs/concepts/iam/dataproc-principals for more in-depth explanation of these identities.

It also lists the pre-existing curated roles to make this all easy (and typically, "default" project settings will automatically have the correct roles already so that you don't have to worry about it). Basically, the "human identity" or the service account you use with --impersonate-service-account needs Dataproc Editor or Project Editor roles, the "control plane identity" needs Dataproc Service Agent, and the "data plane identity" needs Dataproc Worker.