17
votes

Is there is a way to get the predictions from every tree in a random forest in addition to the combined prediction? I would like to output all of the predictions in a list and not view the entire tree. I know that I can get the leaf indices using the apply method, but I'm not sure how to use that to get the value from the leaf.

Edit: Here's what I have so far from comments below. It wasn't clear to me before that the trees in the estimators_ attribute could be called, but it seems that the predict method can be used on each tree using that attribute. Is this the best way to do this, though?

numberTrees = 100
clf = RandomForestRegressor(n_estimators=numberTrees)
clf.fit(X,Y)
for tree in range(numberTrees):
    print(clf.estimators_[tree].predict(val.irow(1)))
4
+1 – You can also do the same thing quite nicely with list comprehension: per_tree_pred = [tree.predict(X) for tree in clf.estimators_]Bill Cheatham
If you want to match the parallel jobs behavior of the model, just copy the source code for predict and leave off the last step where the predictions are averaged!Matt Hancock

4 Answers

4
votes

I'm pretty sure that what you have up there is about the best you can do. As you noted, predict() returns the prediction for the whole RF, but not for its component trees. It can return a matrix, but that's only for the case where there are multiple targets being learned together. In that case it returns one prediction per target, it doesn't return predictions for each tree. You can get the individual tree predictions in R's random forest using predict.all = True, but sklearn doesn't have that. If you tried using apply(), you'd get a matrix of leaf indices, and then you'd still have to iterate over the trees to find out what the prediction for that tree/leaf combination was. So I think what you have is about as good as it gets.

1
votes

I had the same issue and I don't know how you got the right answer by using print(clf.estimators_[tree].predict(val.irow(1))). It gave me random numbers instead of the actual class. After reading the source code in SKlearn, I realized that we actually have to use predict_proba() instead of predict in the code and it gives you the class that the tree predicts according to the order in clf.classes_. For example:

tree_num = 2
tree_pred = clf.estimators_[tree_num].predict_proba(data_test)
print clf.classes_  #gives you the order of the classes
print tree_pred  #gives you an array of 0 with the predicted class as 1
>>> ['class1','class2','class3']
>>> [0, 1, 0]

You can also use cls.predict_proba() on your data and it gives you the probability of each class prediction by the accumulation of trees and releases you from the pain of going through each tree yourself:

x = clf.predict_proba(data_test) # assume data_test has two instances
print rfc.classes_
print x
>>> ['class1', 'class2', 'class3']
>>> [[0.12 ,  0.02,  0.86], # probabilities for the first instance
     [0.35 ,  0.01,  0.64]]  # for the second instance
0
votes

What I have done recently is modify sklearn source code to get it. Inside sklearn package sklearn.ensemble.Randomforestregressor

There is a function that if you add print you will see the individual results of each tree. You could change this to be a return and get the individual results of each tree.

def _accumulate_prediction(predict, X, out, lock):
    """
    This is a utility function for joblib's Parallel.

    It can't go locally in ForestClassifier or ForestRegressor, because joblib
    complains that it cannot pickle it when placed there.
    """
    prediction = predict(X, check_input=False)
    print(prediction)
    with lock:
        if len(out) == 1:
            out[0] += prediction
        else:
            for i in range(len(out)):
                out[i] += prediction[i]

This is a bit more sophisticated since you have to modify the sklearn source code

-1
votes

I am not 100% sure what you exactly want, but there are other some methods in Scikit-learns Random Forest Regressor that will most likely return what you want, specifically the predict method! This method returns an array of the predicted values. What you were referring to about getting the mean is the score method, which simply uses the predict method to return the coefficient of the R squared determinant.