Saving regression model in pySpark

Question

In pySpark MLlib there seems to be no way to save and load regression models, such as the LogisticRegressionModel, SVMModel, NaiveBayesModel and DecisionTreeModel. There is load and save on the recommender model MatrixFactorizationModel through JavaSaveable and JavaLoader mixins, but the regression models are not done this way.

Is there a way that I could work around this by supplying my own load and save routines? If so, how would I go about this?

Is this functionality expected in a future release, or is pySpark MLlib being phased out?

Tarantula Tarantula · Accepted Answer · 2015-04-20T04:55:22

In Spark 1.3.1, the LinearModel class, which is the base class for most of the linear classifiers (i.e. LogisticRegressionModel) is a pure Python class, so you can just try to pickle it or you can save the attributes _coeff - weights() and the _intercept - intercept() by yourself and the construct the LogisticRegressionModel class passing both the weights and the intercept term, like in the example below:

model = LogisticRegressionModel(weights, intercept)

Saving regression model in pySpark

1 Answers