3
votes

I am attempting to submit a job for training in ML-Engine using gcloud but am running into an error with service account permissions that I can't figure out. The model code exists on a Compute Engine instance from which I am running gcloud ml-engine jobs submit as part of a bash script. I have created a service account ([email protected]) for gcloud authentication on the VM instance and have created a bucket for the job and model data. The service account has been granted Storage Object Viewer and Storage Object Creator roles for the bucket and the VM and bucket all belong to the same project.

When I try to submit a job per this tutorial, the following are executed:

time_stamp=`date +"%Y%m%d_%H%M"`
job_name='ObjectDetection_'${time_stamp}

gsutil cp object_detection/samples/configs/faster_rcnn_resnet50.config 
gs://[bucket-name]/training_configs/faster-rcnn-resnet50.${job_name}.config

gcloud ml-engine jobs submit training ${job_name} \
    --project [project-name] \
    --runtime-version 1.12 \
    --job-dir=gs://[bucket-name]/jobs/${job_name} \
    --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz \
    --module-name object_detection.model_main \
    --region us-central1 \
    --config object_detection/training-config.yml \
    -- \
    --model_dir=gs://[bucket-name]/output/${job_name}} \
    --pipeline_config_path=gs://[bucket-name]/training_configs/faster-rcnn-resnet50.${job_name}.config

where [bucket-name] and [project-name] are placeholders for the bucket created above and the project it and the VM are contained in.

The config file is successfully uploaded to the bucket, I can confirm it exists in the cloud console. However, the job fails to submit with the following error:

ERROR: (gcloud.ml-engine.jobs.submit.training) User [[email protected]] does not have permission to access project [project-name] (or it may not exist): Field: job_dir Error: You don't have the permission to access the provided directory 'gs://[bucket-name]/jobs/ObjectDetection_20190709_2001'
- '@type': type.googleapis.com/google.rpc.BadRequest
  fieldViolations:
  - description: You don't have the permission to access the provided directory 'gs://[bucket-name]/jobs/ObjectDetection_20190709_2001'
    field: job_dir

If I look in the cloud console, the files specified by --packages exist in that location, and I've ensured the service account [email protected] has been given Storage Object Viewer and Storage Object Creator roles for the bucket, which has bucket level permissions set. After ensuring the service account is activated and the default, I can also run

gsutil ls gs://[bucket-name]/jobs/ObjectDetection_20190709_2001

which successfully returns the contents of the folder without a permission error. In the project, there exists a managed service account service-[project-number]@cloud-ml.google.com.iam.gserviceaccount.com and I have also granted this account Storage Object Viewer and Storage Object Creator roles on the bucket.

To confirm this VM is able to submit a job, I am able to switch the gcloud user to my personal account and the script runs and submits a job without any error. However, since this exists in a shared VM, I would like to rely on service account authorization instead of my own user account.

2

2 Answers

2
votes

I had a similar problem with exactly the same error.

I found that the easiest way to troubleshoot those errors is to go to "Logging" and search for "PERMISSION DENIED" text.

In my case service account was missing permission "storage.buckets.get". Then you would need to find a role that have this permission. You could do that from IAM->Roles. In that view you could filter roles by permission name. It turned out that only following roles have the needed permission:

  • Storage Admin
  • Storage Legacy Bucket Owner
  • Storage Legacy Bucket Reader
  • Storage Legacy Bucket Writer

I added "Storage Legacy Bucket Writer" role to the service account in the bucket and then was able to submit a job.

0
votes

Have you tried to look in the Compute Engine scope? Shutdown instance, Edit and change Cloud API access scopes to: Allow full access to all Cloud APIs