4
votes

I am trying to access the out of bag samples associated with each tree in a RandomForestClassifier with no luck. I found other informations like Gini score and split feature for each node, looking there : https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx

Does anyone know if it is possible to get the out of bag sample related to a tree ? If not maybe it is possible to get the 'in bag' sample (subset of the dataset used for a specific tree) and then compute the OOB using the original data set ?

Thanks in advance

1

1 Answers

8
votes

You can just figure this out by yourself from source code, look how private _set_oob_score method of random forest works. Every tree estimator in scikit-learn has it's own seed for pseudo random number generator, it's stored inside estimator.random_state field.

During fit procedure every estimator learns on subset of training set, indices for subset of training set will be generated with PRNG and seed from estimator.random_state.

This should work:

from sklearn.ensemble.forest import _generate_unsampled_indices
# X here - training set of examples
n_samples = X.shape[0]
for tree in rf.estimators_:
    # Here at each iteration we obtain out of bag samples for every tree.
    unsampled_indices = _generate_unsampled_indices(
    tree.random_state, n_samples)