1
votes

I want to run tensorflow traning script in google cloud ml. One of the buckets from an external project. I have created cloud ml engine service account and add it as an user to this external project.

After that, have executed the following command in my terminal with gcloud initialised project:

gcloud auth activate-service-account --my-service-acc-key.json

And then submit my job as:

gcloud ml-engine jobs submit training ..arguments

Job was submitted successfully and was running until accessing resources from external bucket with file_io.FileIO('gs://external-bucket')

I got the following error SSL: no alternative certificate subject name matches target host ${bucket-name}.storage.googleapis.com instead.

Looks like something wrong with credentials, but can't find anything useful in documentation.

What could be a problem?

1
Could you confirm whether ${bucket-name} was in the original error message or did you just replace the original bucket name with it for privacy reasons? If ${bucket-name} was in the original error message, that probably means that you're trying to read an object from the bucket ${bucket-name} which is not a real bucket, so there might be a bug with string substitution in your training Python code. - Alexey Surkov
@AlexeySurkov Hi, bucket-name was replaced for security reason, but basically it's domain-name bucket, f.e. img.domain.com/pictures/. Bucket is accessible and can be reached with gsutil, f.e gsutil ls bucket-name if an user auth as service account. btw, I'm able to read from other bucket in the same project, which is not domain-name. - Vlad Shkola

1 Answers

1
votes

Unfortunately domain buckets like bucketname.domainname.com are not properly supported by the GCS client library inside TensorFlow at the moment.

This problem was fixed in the TensorFlow Google repository today.

The fix should become available on TensorFlow github within 2-3 days from now, after which you should be able to either build TensorFlow from head or take a nightly Linux build and provide it as one of the package_uris when submitting a training job to Cloud ML Engine.

Alternatively you can wait until it gets picked up by the next official TensorFlow release supported by Cloud ML Engine.