0
votes

I'm building a Docker container to submit ML training jobs using gcloud - the runnable is actually a Python program and gcloud is being executed via subprocess.check_output. Running the program outside a Docker container works just fine which makes me wonder if there is some dependency that is not installed but gcloud simply outputs no useful logs at all.

While running gcloud ml-engine jobs submit training the executable returns exit status 1 simply outputting Internal Error. The logs that are available on Google Cloud Console are always 5 entries of "Validating job requirements..." with no further information.

The Docker container has the following installed dependencies (some are not relevant to Google Cloud ML but are used by other features in the program):

Via apt-get: python, python-pip, python-dev, libmysqlclient-dev, curl

Via pip install: flask, MySQL-python, configparser, pandas, tensorflow

The gcloud tool itself is installed by downloading the SDK and installing it through command line:

RUN curl https://dl.google.com/dl/cloudsdk/release/google-cloud-sdk.tar.gz > /tmp/google-cloud-sdk.tar.gz
RUN mkdir -p /usr/local/gcloud
RUN tar -C /usr/local/gcloud -xvf /tmp/google-cloud-sdk.tar.gz
RUN /usr/local/gcloud/google-cloud-sdk/install.sh
ENV PATH $PATH:/usr/local/gcloud/google-cloud-sdk/bin

Account credentials are setup via

RUN gcloud auth activate-service-account --key-file path-to-keyfile-in-docker-container
RUN gsutil version -l

Last gsutil version command is pretty much just to make sure SDK installation is working.

Does anyone have any clue what might be happening or how to further debug what might me causing an Internal Error on gcloud?

Thanks in advance! :)

1

1 Answers

0
votes

Please make sure all the parameters are set properly and make sure you have all your dependencies uploaded and packaged properly.

If everything is done and you still can't run the job, you will need more than just "Internal Error" to pinpoint the issue. Please either contact Google Cloud Platform support or file a bug on the Public Issue Tracker to get further assistance.