0
votes

I am using the RandomForestRegressor class of the scikit-learn library (python 3.x) and I am aware that the function to measure the quality of a split in a decision tree is the variance reduction (mse). Given that the RandomForestRegressor class supports multiple output, my question is: how is the quality of a split computed in case of multiple outputs in this particular class?

From reading the source code of the class defining the split criterion I would say that the impurity decrease of a split in a tree is computed as the average impurity decrease over all output variables. And hence, only one model is build given multiple outputs. Is that the default way in scikit-learn RandomForestRegressor class? I was hoping someone could have a look with me for I am not completely sure wether my statements are correct!

Many thanks in advance!

https://github.com/scikit-learn/scikit-learn/blob/a24c8b464d094d2c468a16ea9f8bf8d42d949f84/sklearn/tree/_criterion.pyx#L695

1

1 Answers

1
votes

One of the authors of the corresponding scikit-learn class (Gilles Louppe) was kind enough to answer my question: The above understanding is correct. The reduction of variance is computed over each class and then averaged to produce the final score.