Stratified sampling for Random forest -Python

Question

I'm building a random forest classification model with the response variable split being 98%(False)-2%(True). I'm using Scikit Learn's RandomForest classifier for this.

What is the best way to handle this unbalanced data and avoid oversampling?

I have already answered the question here. Please check. stackoverflow.com/a/36255925/2523817 — Sagar Waghmode

Farseer Farseer · Accepted Answer · 2016-03-29T10:07:16

You can use parameter class_weight .

Weights associated with classes in the form {class_label: weight}

You can give more weight to your small class and find best weight using cross-validation.

For example class_weight={1: 10, 0:1}. Gives more weight to class labeled 1.

Stratified sampling for Random forest -Python

2 Answers