ML Engine: "Bad model detected... No module named trainer" when creating a model version

Question

I have successfully trained a model on ML Engine. I can get the model.joblib file from my Cloud Storage bucket and load it, and also get local predictions using gcloud. However I can't create a model version.

JOB_DIR=$(gcloud ml-engine jobs describe "$JOB" \
    --format="value(trainingInput.jobDir)")

gcloud ml-engine versions create "$VERSION" \
  --model "$MODEL_NAME" \
  --origin "$JOB_DIR" \
  --framework scikit-learn \
  --runtime-version 1.10 \
  --python-version 3.5

Returns:

ERROR: (gcloud.ml-engine.versions.create) Bad model detected with error: "Failed to load model: Could not load the model: /tmp/model/0001/model.joblib. No module named 'trainer'. (Error code: 0)"

How can I fix this error?

My model is a sklearn Pipeline. It uses a FunctionTransformer that calls a function in the trainer.model module. I have wondered whether the VM that's serving predictions hasn't got that trainer package installed. However I've been unable to confirm/reject this hypothesis or find a way in the documentation to point to the package.

N3da N3da · Accepted Answer · 2018-10-29T23:20:04

Your hypothesis is correct. The feature that allows you to upload a custom package to use at prediction time is available as an alpha feature only at the moment. You can get access to it via this sign up form: https://docs.google.com/forms/d/e/1FAIpQLSc6fxgXQIyA6BDLfCKOJPu5CyCuOB_M_rGTws0629od5mlznw/viewform?usp=sf_link

ML Engine: "Bad model detected... No module named trainer" when creating a model version

1 Answers