First you need to create a service account and also need to provide access to the following roles:
- Dataproc Worker: According to [doc][1]
To create a cluster with a user-specified service account, the
specified service account must have all permissions granted by the
Dataproc Worker role.
2.Dataproc Hub Agent: This will provide access to act as service account permission, otherwise provide the following error:
ERROR: (gcloud.beta.dataproc.clusters.create) INVALID_ARGUMENT: User
not authorized to act as service account
'[email protected]'. To act
as a service account, user must have one of [Owner, Editor, Service
Account Actor] roles. See
https://cloud.google.com/iam/docs/understanding-service-accounts for
additional details.
3.Dataproc Editor: This role will provide access to create and delete the dataproc cluster.
Activate service account: After providing access to the roles, download the service account json. Activate the new service account by gcloud auth active-service-account --key-file=<service-json> . Check the activation by gcloud auth list. Set GOOGLE_APPLICATION_CREDENTIALS environment variable by export GOOGLE_APPLICATION_CREDENTIALS="service-json-full-path"
Now hopefully everything is ready to create dataproc cluster using service account. Here is the sample commands to create dataproc cluster using service account:
gcloud auth activate-service-account --key-file=<service-key-file>
export GOOGLE_APPLICATION_CREDENTIALS="<service-key-file>"
gcloud beta dataproc clusters create <CLUSTER-NAME> \
--region=<REGION> \
--project=<PROJECT-ID> \
--service-account=<SERVICE-ACCOUNT-EMAIL> \
--single-node