My goal is to export an h2o model trained on spark with scala (using sparkling-water), such that I can import it in an application without Spark.
Thus:
- using scala (the documentation only shows examples for r and python)
- export a model which is build using sparkling-water (h2o with spark)
- import a model in scala (without spark nor h2o cluster, only the
hex-genmodel
package)
I'm therefore using the ModelSerializationSupport
to export, and the MojoModel.load
to import
val gbmParams = new GBMParameters()
gbmParams._train = train
gbmParams._response_column = "target"
gbmParams._ntrees = 5
gbmParams._valid = valid
gbmParams._nfolds = 3
gbmParams._min_rows = 1
gbmParams._distribution = DistributionFamily.multinomial
val gbm = new GBM(gbmParams)
val gbmModel = gbm.trainModel.get
val mojoPath = "./model.zip"
ModelSerializationSupport.exportMOJOModel(gbmModel, new File(mojoPath).toURI, force = true)
val simpleModel = new EasyPredictModelWrapper(MojoModel.load(mojoPath))
Fails with
error in opening zip file
java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:220)
at java.util.zip.ZipFile.<init>(ZipFile.java:150)
at java.util.zip.ZipFile.<init>(ZipFile.java:121)
at hex.genmodel.ZipfileMojoReaderBackend.<init>(ZipfileMojoReaderBackend.java:13)
at hex.genmodel.MojoModel.load(MojoModel.java:33)
...
It seems that the mojo exporter doesn't use the same format as expected in the hex.genmodel
(a zip apparently)
Running on h2o 2.1.23 (2.1.24 fails when building the cluster, as reported on https://0xdata.atlassian.net/browse/SW-776) and spark 2.1
-- update:
Using the ModelSerializationSupport class to load it's own export fails too with the same exception:
ModelSerializationSupport.loadMOJOModel(new File(mojoPath).toURI)
H2OModel export and load
Loading back as H2OModel (thus with sparkling-water) does work:
val h2oModelPath = "./model_h2o"
ModelSerializationSupport.exportH2OModel(gbmModel, new File(h2oModelPath).toURI, force = true)
val loadedModel: GBMModel = ModelSerializationSupport.loadH2OModel(new File(h2oModelPath).toURI)
H2OMOJOModel export and load
Loading it back with H2OMOJOModel
does work (copied from implementation of H2OGBM
):
val mojoModel = new H2OMOJOModel(ModelSerializationSupport.getMojoData(gbmModel))
mojoModel.write.overwrite.save(mojoPath)
H2OMOJOModel.load(mojoPath)
H2OGBM export with MojoModel import
Attempting to import using regular MojoModel
fails though :
val gbm = new H2OGBM(gbmParams)(h2oContext, myspark.sqlContext)
val gbmModel = gbm.trainModel(gbmParams)
val mojoPath = "./models.zip"
gbmModel.write.overwrite.save(mojoPath)
MojoModel.load(mojoPath)
with the following exception:
./models.zip/model.ini (No such file or directory)
java.io.FileNotFoundException: ./models.zip/model.ini (No such file or directory)