2
votes

I am new the Google cloud and evaluating Dataproc cluster and one of the core requirement is to dynamically create a cluster and process the jobs. For the various documentation reads and link, I attempted by creating a service account and added roles starting with "Dataproc Editor".

I generated the key file and activate the service account

gcloud auth activate-service-account --key-file=<Key File>

and try to create a cluster

gcloud beta dataproc clusters create jill-cluster \
    --enable-component-gateway \
    --subnet default \
    --zone europe-west3-b \
    --region europe-west3 \
    --master-machine-type n1-standard-4 \
    --master-boot-disk-size 50 \
    --num-workers 2 \
    --worker-machine-type n1-standard-4 \
    --worker-boot-disk-type pd-ssd \
    --worker-boot-disk-size 100 \
    --image https://compute.googleapis.com/compute/v1/projects/poc/global/images/poc-1-5-1-debina10 \
    --scopes 'https://www.googleapis.com/auth/cloud-platform' \
    --project poc \
    --verbosity info \
    --autoscaling-policy=poc-auto-scale-policy \
    --service-account=<Service account>

I am getting this error

{
    "code": 403,
    "message": "Not authorized to requested resource.",
    "status": "PERMISSION_DENIED"
}

and I started added more roles to the service account and ended up as shown below enter image description here and still unable to create a cluster. I am not quite sure what i am missing. I tried command line as well as programmatic approach which resulted the same. Unfortunately i could not able to get enough clue from the logging as well.

---------- Update ---------------

I think I missed out some info in my initial question. I have an user account with owner role and was currently using that account to experiment and using that i can able to create cluster and submit jobs. So i think the project has the necessary required roles.

Now I want to move towards automation and want to achieve

  1. Manage the cluster using a service account
  2. Should able to submit and run the jobs and manage the jobs.

I started both responsibilities with a single account but as suggested i can start having different service accounts.

1
What happens if you just gcloud dataproc clusters list after running activate-service-account? Did you add the roles at the project level? Or is it possible you added the roles on the service-account account itself as a target resource? You should expect to see the service account listed as a "member" in gcloud projects get-iam-policy <your-project> and not on gcloud iam service-accounts get-iam-policy <service-account> - Dennis Huo
Thank you for the response . I removed all the roles and left only "Dataproc Editor" role. Below is the response from my user account who is a project owner. ``` /gcp$ gcloud projects get-iam-policy poc --flatten="bindings[].members" --format='table(bindings.role)' --filter="bindings.members:[email protected]" ROLE roles/dataproc.editor ``` Activated the service account and tried gcloud dataproc clusters list and still getting the same error. I think this should be cleared before i try your big answer. - Srinivas Jill
Hmm that's strange. For the record, usually when it's the more fine-grained issues with different dataproc roles, and the backend/worker roles you'll see much more detailed errors that tell you exactly what you need. The more generic "Not authorized to resource" message typically means failure at the very first entry point. What happens if you try to use that service-account to do other GCP actions after granting necessary roles (maybe make it project editor again just for testing), such as gcloud compute regions list an gcloud compute instances list? - Dennis Huo
I also wonder what would happen if you create a new service account from scratch and just try to get the basics working on that, if that makes any difference. - Dennis Huo
There's also a neat tool called "Policy Troubleshooter": cloud.google.com/console/iam-admin/troubleshooter - try to enter your service account email in there and test it for resource: //cloudresourcemanager.googleapis.com/projects/YOUR_PROJECT and permission dataproc.clusters.list - Dennis Huo

1 Answers

0
votes

Since there are various features relevant here such as specifying a custom service account for the identity of the cluster and using custom images, some steps can be done to narrow down where the main issue might be:

  1. Check whether any access-controlled API call works for the service account given broad permissions like Project Viewer, for example, and trying gcloud compute instances list, gcloud compute regions list, and gcloud dataproc clusters list - this will narrow down if it's something about Dataproc-specific roles or if the service-account itself is not working for some reason.
  2. If it appears to work for other APIs but not Dataproc APIs, try the Policy Troubleshooter tool tool until you can get basic "viewer" types of requests working with the Dataproc API
  3. If a "default" Dataproc cluster can be created but one with custom options fails, you may need to add additional permissions, such as Service Account User for a specified cluster identity service-account, Compute Network User for cross-project networks, Compute Image User for cross-project custom images, or Storage Object Creator for cross-project or custom GCS configbuckets. Generally, errors on these types of permissions are expected to provide detailed error messages from the Dataproc API, compared to basic "front-door" access errors described in #1 which might have more generic error messages like you saw ("Not authorized to requested resource").

Other things to check include making sure you applied the role memberships to the correct resource (in this case, the project itself), rather than on the service account, since that list of roles should contain everything you need. Check:

gcloud projects get-iam-policy $PROJECT

To make sure your list of all those roles actually appears there with members: listing your service account. You should not expect the things like Dataproc Editor, etc to appear on the service-account's resource policy itself, as in:

gcloud iam service-accounts get-iam-policy $SERVICE_ACCOUNT

should return an empty response with only an etag: field.

When using custom service accounts, it's also important to understand the distinctions between the different roles involved.

One thing to first clarify is that the Dataproc "worker" and the Dataproc creator/user are not the generally the same identity, even though they can be. So if you intend for the service account to be used to create Dataproc clusters, Dataproc Editor is correct, but if you also intend to make the cluster itself take on the identity of the service account, you need to grant the service account the Dataproc Worker role as well: https://cloud.google.com/dataproc/docs/concepts/iam/dataproc-principals

In this vein, if you're trying to use a service account to create a cluster that then acts as a specified service account, even if the specified account is "itself", you need to grant the Service Account User role to your service account, either at the project level (if you are okay granting it broad actAs permissions in the project) or just on the single service account.

Since you appear to be using a custom image, assuming you followed the advanced instructions for creating a Dataproc custom image you may also have to grant your service account the Compute Image User role.

In addition to that, if using an image in a different project you may need to check the service account of the form service-[project-number]@dataproc-accounts.iam.gserviceaccount.com and if your project was created before ~Sept 2019, also the legacy [project-number]@cloudservices.gserviceaccount.com service account. Those would need to be granted the Compute Image user role on the images or the project holding those images. For same-project images, the existing Dataproc Service Agent role should already include the instanceAdmin permission that includes the image user role.