How is the impurity decrease of a split computed in case we have multiple outputs using RandomForestRegressor of Scikit-Learn library

Question

I am using the RandomForestRegressor class of the scikit-learn library (python 3.x) and I am aware that the function to measure the quality of a split in a decision tree is the variance reduction (mse). Given that the RandomForestRegressor class supports multiple output, my question is: how is the quality of a split computed in case of multiple outputs in this particular class?

From reading the source code of the class defining the split criterion I would say that the impurity decrease of a split in a tree is computed as the average impurity decrease over all output variables. And hence, only one model is build given multiple outputs. Is that the default way in scikit-learn RandomForestRegressor class? I was hoping someone could have a look with me for I am not completely sure wether my statements are correct!

Many thanks in advance!

https://github.com/scikit-learn/scikit-learn/blob/a24c8b464d094d2c468a16ea9f8bf8d42d949f84/sklearn/tree/_criterion.pyx#L695

Paul Paul · Accepted Answer · 2018-06-06T08:50:02

One of the authors of the corresponding scikit-learn class (Gilles Louppe) was kind enough to answer my question: The above understanding is correct. The reduction of variance is computed over each class and then averaged to produce the final score.

How is the impurity decrease of a split computed in case we have multiple outputs using RandomForestRegressor of Scikit-Learn library

1 Answers