I am attempting to submit a job for training in ML-Engine using gcloud but am running into an error with service account permissions that I can't figure out. The model code exists on a Compute Engine instance from which I am running gcloud ml-engine jobs submit
as part of a bash script. I have created a service account ([email protected]) for gcloud authentication on the VM instance and have created a bucket for the job and model data. The service account has been granted Storage Object Viewer and Storage Object Creator roles for the bucket and the VM and bucket all belong to the same project.
When I try to submit a job per this tutorial, the following are executed:
time_stamp=`date +"%Y%m%d_%H%M"`
job_name='ObjectDetection_'${time_stamp}
gsutil cp object_detection/samples/configs/faster_rcnn_resnet50.config
gs://[bucket-name]/training_configs/faster-rcnn-resnet50.${job_name}.config
gcloud ml-engine jobs submit training ${job_name} \
--project [project-name] \
--runtime-version 1.12 \
--job-dir=gs://[bucket-name]/jobs/${job_name} \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz \
--module-name object_detection.model_main \
--region us-central1 \
--config object_detection/training-config.yml \
-- \
--model_dir=gs://[bucket-name]/output/${job_name}} \
--pipeline_config_path=gs://[bucket-name]/training_configs/faster-rcnn-resnet50.${job_name}.config
where [bucket-name] and [project-name] are placeholders for the bucket created above and the project it and the VM are contained in.
The config file is successfully uploaded to the bucket, I can confirm it exists in the cloud console. However, the job fails to submit with the following error:
ERROR: (gcloud.ml-engine.jobs.submit.training) User [[email protected]] does not have permission to access project [project-name] (or it may not exist): Field: job_dir Error: You don't have the permission to access the provided directory 'gs://[bucket-name]/jobs/ObjectDetection_20190709_2001'
- '@type': type.googleapis.com/google.rpc.BadRequest
fieldViolations:
- description: You don't have the permission to access the provided directory 'gs://[bucket-name]/jobs/ObjectDetection_20190709_2001'
field: job_dir
If I look in the cloud console, the files specified by --packages
exist in that location, and I've ensured the service account [email protected]
has been given Storage Object Viewer and Storage Object Creator roles for the bucket, which has bucket level permissions set. After ensuring the service account is activated and the default, I can also run
gsutil ls gs://[bucket-name]/jobs/ObjectDetection_20190709_2001
which successfully returns the contents of the folder without a permission error. In the project, there exists a managed service account service-[project-number]@cloud-ml.google.com.iam.gserviceaccount.com
and I have also granted this account Storage Object Viewer and Storage Object Creator roles on the bucket.
To confirm this VM is able to submit a job, I am able to switch the gcloud user to my personal account and the script runs and submits a job without any error. However, since this exists in a shared VM, I would like to rely on service account authorization instead of my own user account.