1
votes

I am trying for setting the initial weights or parameters for a machine learning (Classification) algorithm in Spark 2.x. Unfortunately, except for MultiLayerPerceptron algorithm, no other algorithm is providing a way to set the initial weights/parameter values.

I am trying to solve Incremental learning using spark. Here, I need to load old model re-train the old model with new data in the system. How can I do this?

How can I do this for other algorithms like:

  • Decision Trees
  • Random Forest
  • SVM
  • Logistic Regression

I need to experiment multiple algorithms and then need to choose the best performing one.

1
Which other Spark algorithms do have weights? I kindly suggest you be specific, and leave "any other algorithms" aside... - desertnaut
Question Edited - Jack Daniel
These algorithms don't have weights in the first place... - desertnaut
If I want to do Incremental learning, how can I do with these algorithms? - Jack Daniel
That's a completely different question (and a too-broad one, arguably)... - desertnaut

1 Answers

0
votes

How can I do this for other algorithms like:

  • Decision Trees
  • Random Forest

You cannot. Tree based algorithms are not well suited for incremental learning, as they look at the global properties of the data and have no "initial weights or values" that can be used to bootstrap the process.

  • Logistic Regression

You can use StreamingLogisticRegressionWithSGD which exactly implements required process, including setting initial weights with setInitialWeights.

  • SVM

In theory it could be implemented similarly to streaming regression StreamingLogisticRegressionWithSGD or StreamingLinearRegressionWithSGD, by extending StreamingLinearAlgorithm, but there is no such implementation built-in, ans since org.apache.spark.mllib is in a maintanance mode, there won't be.