0
votes

I have successfully trained a model scikit on ML Engine. I can get the model.joblib file from my Cloud Storage bucket and load it, and also get local predictions using gcloud. However I can't create a model version.

I using sklearn_crfsuite estimator

crf = sklearn_crfsuite.CRF(

algorithm='lbfgs',

c1=0.1,

c2=0.1,

max_iterations=2,

all_possible_transitions=True

)

I'm saving the model as described below:

model = 'model.joblib'

joblib.dump(crf, model)

my setup.py to train is:

'''Cloud ML Engine package configuration.'''
from setuptools import setup, find_packages



REQUIRED_PACKAGES = ['joblib==0.13.0',
                     'sklearn-crfsuite==0.3.6',
                     'sklearn==0.0'
                    ]

setup(name='trainer',
      version='1.0',
      packages=find_packages(),
      include_package_data=True,
      install_requires=REQUIRED_PACKAGES)

I submit package train:

gcloud ml-engine jobs submit training train_$JOB_NAME \
--runtime-version 1.8 \
--python-version 2.7 \
--job-dir=gs://$BUCKET_NAME/jobs/$JOB_NAME/ \
--package-path= trainer \
--module-name trainer.model \
--region $REGION \
--scale-tier BASIC \
-- \
--train-data-dir=gs://$BUCKET_NAME/dataset \
--job-dir=gs://$BUCKET_NAME/jobs/$JOB_NAME

The model is trained and exported in job-dir, but when to deploy:

gcloud alpha ml-engine versions create v1 --model teste --origin \
$ORI --python-version 2.7 --runtime-version 1.8 --framework scikit-learn

it reports this error:

ERROR: (gcloud.alpha.ml-engine.versions.create) Bad model detected with error: "Failed to load model: Could not load the model: /tmp/model/0001/model.joblib. No module named sklearn_crfsuite.estimator. (Error code: 0)"

1

1 Answers

0
votes

Could you verify that you have the directory structure correct?

  • You do not need to include sklearn in your setup.py, since it is provided by the framework. To avoid confusion, please remove it from REQUIRED_PACKAGES.

  • You can verify that your setup.py is correct by seeing if moving import joblib to be before the import to sklearn-crfsuite works

  • Make sure setup.py is parallel to trainer (i.e. one directory up from model.py). See this GitHub repo for an example:

https://github.com/GoogleCloudPlatform/training-data-analyst/tree/master/blogs/sklearn/babyweight