While training my model for data greater than 20GB in BASIC Tier in Cloud ML my jobs are failing because there is no disk space available in the Cloudml machines and I am not able to find any details in gcloud ml documentations [https://cloud.google.com/ml-engine/docs/tensorflow/machine-types].
Need help in deciding the TIER for my training jobs also the utilisation is very less in Job Details Graphs.
Expand all | Collapse all {
insertId: "1klpt2"
jsonPayload: {
created: 1554434546.3576794
levelname: "ERROR"
lineno: 51
message: "Failed to train : [Errno 28] No space left on device"
pathname: "/root/.local/lib/python3.5/site-
packages/loggerwrapper.py"
}
labels: {
compute.googleapis.com/resource_id: ""
compute.googleapis.com/resource_name: "cmle-training-
10361805218452604847"
compute.googleapis.com/zone: ""
ml.googleapis.com/job_id/log_area: "root"
ml.googleapis.com/trial_id: ""
}
logName: "projects/backend/logs/master-replica-0"
receiveTimestamp: "2019-03-31T12:32:30.07683Z"
resource: {
labels: {
job_id: ""
project_id: "backend"
task_name: "master-replica-0"
}
type: "ml_job"
}
severity: "ERROR"
timestamp: "2019-03-31T12:32:26.357679367Z"
}