I am applying ScikitLearn's random forests on an extremely unbalanced dataset (ratio of 1:10 000). I can use the class_weigth='balanced' parameter. I have read it is equivalent to undersampling.
However, this method seems to apply weights to samples and do not change the actual number of samples.
Because each tree of the Random Forest is built on a randomly drawn subsample of the training set, I am afraid the minority class will not be representative enough (or not representated at all) in each subsample. Is this true? This would lead to very biased trees.
Thus, my question is: does the class_weight="balanced" parameter allows to build reasonably unbiased Random Forest models on extremely unbalanced datasets, or should I find a way to undersample the majority class at each tree or when building the training set?