I'm building a Docker container to submit ML training jobs using gcloud - the runnable is actually a Python program and gcloud is being executed via subprocess.check_output. Running the program outside a Docker container works just fine which makes me wonder if there is some dependency that is not installed but gcloud simply outputs no useful logs at all.
While running gcloud ml-engine jobs submit training the executable returns exit status 1 simply outputting Internal Error. The logs that are available on Google Cloud Console are always 5 entries of "Validating job requirements..." with no further information.
The Docker container has the following installed dependencies (some are not relevant to Google Cloud ML but are used by other features in the program):
Via apt-get: python, python-pip, python-dev, libmysqlclient-dev, curl
Via pip install: flask, MySQL-python, configparser, pandas, tensorflow
The gcloud tool itself is installed by downloading the SDK and installing it through command line:
RUN curl https://dl.google.com/dl/cloudsdk/release/google-cloud-sdk.tar.gz > /tmp/google-cloud-sdk.tar.gz
RUN mkdir -p /usr/local/gcloud
RUN tar -C /usr/local/gcloud -xvf /tmp/google-cloud-sdk.tar.gz
RUN /usr/local/gcloud/google-cloud-sdk/install.sh
ENV PATH $PATH:/usr/local/gcloud/google-cloud-sdk/bin
Account credentials are setup via
RUN gcloud auth activate-service-account --key-file path-to-keyfile-in-docker-container
RUN gsutil version -l
Last gsutil version command is pretty much just to make sure SDK installation is working.
Does anyone have any clue what might be happening or how to further debug what might me causing an Internal Error on gcloud?
Thanks in advance! :)