0
votes

I've trained a Random Forest (regressor in this case) model using scikit learn (python), and I'would like to plot the error rate on a validation set based on the numeber of estimators used. In other words, there's a way to predict using only a portion of the estimators in your RandomForestRegressor?

Using predict(X) will give you the predictions based on the mean of every single tree results. There is a way to limit the usage of the trees? Or eventually, get each single output for each single tree in the forest?

2

2 Answers

2
votes

Thanks to cohoz I've figured out how to do it. I've written a couple of def, which turned out to be handy while plotting the learning curve of the random forest regressor on the test set.

## Error metric
import numpy as np
def rmse(train,test):
    return np.sqrt(np.mean(pow(test - train+,2)))

## Print test set error
## Input the RandomForestRegressor, test set feature and test set known values
def rfErrCurve(rf_model,test_X,test_y):
    p = []
    for i,tree in enumerate(rf_model.estimators_):
                p.insert(i,tree.predict(test_X))
                print rmse(np.mean(p,axis=0),test_y)
1
votes

Once trained, you can access these via the "estimators_" attribute of the random forest object.