Can sample weight be used in Spark MLlib Random Forest training?

Question

I am using Spark 1.5.0 MLlib Random Forest algorithm (Scala code) to do two-class classification. As the dataset I am using is highly imbalanced, so the majority class is down sampled at 10% sampling rate.

Is it possible to use the sampling weight (10 in this case) in the Spark Random Forest training? I don't see weight among the input parameters for trainClassifier() in Random Forest.

Edi Bice Edi Bice · Accepted Answer · 2016-04-25T15:27:29

Not at all in Spark 1.5 and only partially (Logistic/LinearRegression) in Spark 1.6

https://issues.apache.org/jira/browse/SPARK-7685

Here's the umbrella JIRA tracking all the subtasks

https://issues.apache.org/jira/browse/SPARK-9610

Can sample weight be used in Spark MLlib Random Forest training?

1 Answers