7
votes

I am trying to run the spark job on the google dataproc cluster as

 gcloud dataproc jobs submit hadoop --cluster <cluster-name> \
--jar file:///usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
--class org.apache.hadoop.examples.WordCount \
--arg1 \
--arg2 \

But the Job throws error

 (gcloud.dataproc.jobs.submit.spark) PERMISSION_DENIED: Request had insufficient authentication scopes.

How do I add the auth scopes to run the JOB?

2
Are you creating the Dataproc cluster (via gcloud dataproc clusters create) and attempting to run the Dataproc job (via gcloud dataproc jobs submit) from the same machine / shell? In general, your credentials need the CLOUD_PLATFORM OAuth to interact with Dataproc.Angus Davis
To elaborate on Angus's question, that also means if you're running the gcloud command from any GCE VM, then you needed to create the GCE VM with --scopes cloud-platform (see gcloud docs). Same applies to if you're running the command from inside a Dataproc cluster; you'd use gcloud dataproc clusters create --scopes cloud-platform.Dennis Huo
@Dennis Huo Could you possibly post that comment as an answer so this question may be closed? The only thing I would add is the recent addition of gcloud alpha compute instances set-scopes for correcting the scope of already existing GCE instances.Yannick MG
Done, thanks for pinging this.Dennis Huo

2 Answers

17
votes

Usually if you're running into this error it's because of running gcloud from inside a GCE VM that's using VM-metadata controlled scopes, since otherwise gcloud installed on a local machine will typically already be using broad scopes to include all GCP operations.

For Dataproc access, when creating the VM from which you're running gcloud, you need to specify --scopes cloud-platform from the CLI, or if creating the VM from the Cloud Console UI, you should select "Allow full access to all Cloud APIs":

Cloud Console Create VM UI - Identity and API access

As another commenter mentioned above, nowadays you can also update scopes on existing GCE instances to add the CLOUD_PLATFORM scope.

0
votes

You Need to check the option for allowing the API access while creating the DataProc cluster. Then only you can submit the jobs to cluster using gcloud dataproc jobs submit commandenter image description here