I have a class imbalance problem and been experimenting with a weighted Random Forest using the implementation in scikit-learn (>= 0.16).
I have noticed that the implementation takes a class_weight parameter in the tree constructor and sample_weight parameter in the fit method to help solve class imbalance. Those two seem to be multiplied though to decide a final weight.
I have trouble understanding the following:
- In what stages of the tree construction/training/prediction are those weights used? I have seen some papers for weighted trees, but I am not sure what scikit implements.
- What exactly is the difference between class_weight and sample_weight?