2
votes

I'm building a random forest classification model with the response variable split being 98%(False)-2%(True). I'm using Scikit Learn's RandomForest classifier for this.

What is the best way to handle this unbalanced data and avoid oversampling?

2
I have already answered the question here. Please check. stackoverflow.com/a/36255925/2523817Sagar Waghmode

2 Answers

0
votes

You can use parameter class_weight .

Weights associated with classes in the form {class_label: weight}

You can give more weight to your small class and find best weight using cross-validation.

For example class_weight={1: 10, 0:1}. Gives more weight to class labeled 1.

0
votes

In newer versions of sklearn's random forest classifier, you can simply set class_weight="balanced".