I'm trying to train Object Detection model with gcloud ml-engine,reference to the official documents https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_cloud.md,and set runtime-version=1.4,and reference this issue https://github.com/tensorflow/models/issues/2739 to modify the setup.py , but have the error:
worker-replica-3 2018-01-09 06:32:39.416080: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
ker-replica-3 grpc epoll fd: 3
{
insertId: "1fwigqcg5k37j2o"
jsonPayload: {
created: 1515479559.41658
levelname: "ERROR"
lineno: 1051
message: " grpc epoll fd: 3"
pathname: "ev_epoll1_linux.c"
thread: 917
}
The last error message is:
The replica master 0 ran out-of-memory and exited with a non-zero status of 247.
I start the training job on Cloud ML Engine using the following command:
gcloud ml-engine jobs submit training object_detection_training_date +%s \
--job-dir=gs://mybucket/train \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
--module-name object_detection.train \
--region asia-east1 \
--config object_detection/samples/cloud/cloud.yml \
-- \
--train_dir=gs://mybucket/train \
--pipeline_config_path=gs://mybucket/data/ssd_mobilenet_v1_coco.config \
--runtime-version 1.4