what is the difference in including oob_Score =True and not including oob_score in RandomForestClassifier in sklearn in python. The out-of-bag (OOB) error is the average error for each calculated using predictions from the trees that do not contain in their respective bootstrap sample right , so how does including the parameter oob_score= True affect the calculations of average error.
1 Answers
For each tree, only a share of data is selected for building the tree, i.e. training. The remaining samples are the the out-of-bag samples. These out-of-bag samples can be used directly during training to compute a test accuracy. If you activate the option, the "oob_score_" and "oob_prediction_" will be computed.
The training model will not change if you activate or not the option. Obviously, due to the random nature of RF, the model will not be exactly the same if you apply twice, but it has nothing to do with the "oob_score" option. Unfortunately, scikit-learn option does not allow you to set the OOB ration, i.e. the percentage of samples used to build a tree. This is the case in other library (e.g. C++ Shark http://image.diku.dk/shark/sphinx_pages/build/html/rest_sources/tutorials/algorithms/rf.html).