5
votes

I'm having a hard time in finding out what does the oob_score_ means on Random Forest Regressor in scikit-learn. On the documentation it says:

oob_score_ : float Score of the training dataset obtained using an out-of-bag estimate.

At first I thought it would return the score for each instance on the set of the out-of-bag instances. But this is given by the attribute:

oob_prediction_ : array of shape = [n_samples] Prediction computed with out-of-bag estimate on the training set.

Which returns an array containing the prediction of each instance. Then analyzing the others parameters on the documentation, I realized that the method score(X, y, sample_weight=None) returns the Coefficient of determination R².

Considering that calling the attribute oob_score_ returns a single float value, what does it represent? If possible, I would like to know as well how it is computed.

The link to the documentation is RandomForestRegressor.

1

1 Answers

1
votes

It returns exactly what is said in the documentation

oob_score_ : float Score of the training dataset obtained using an out-of-bag estimate.

where score

score(X, y, sample_weight=None) returns the Coefficient of determination R².

and out-of-bag estimate are samples not used for training due to bagging procedure.

Just look at a source, lines 727-740

    predictions /= n_predictions
    self.oob_prediction_ = predictions

    if self.n_outputs_ == 1:
        self.oob_prediction_ = \
            self.oob_prediction_.reshape((n_samples, ))

    self.oob_score_ = 0.0

    for k in range(self.n_outputs_):
        self.oob_score_ += r2_score(y[:, k],
                                    predictions[:, k])

    self.oob_score_ /= self.n_outputs_

In other words it is just R2 score on oob_prediction_