1
votes

In Random Forest method, for each tree we randomly select a set of variables (features) of fixed size. But once this set is frozen for that particular tree, does the tree behave like a regular decision tree algorithm?

I am assuming that random forest is nothing but generating a bunch of classical 'decision trees' and taking their votes towards the final classification. But in many places whatever description I have read seems to suggest that; for a given decision tree within the forest even at each node we randomly select variables. Is that the case?

Does it mean that at each node in the tree, we randomly select m variables from the variable set which is fixed for that tree? Or from the global variable set of the training dataset? And then from the selected set of variables we select 1 variable heuristically (e.g. whichever variable maximises information gain) -- is that a correct statement?

1

1 Answers

0
votes

"In Random Forest method, for each tree we randomly select a set of variables (features) of fixed size. But once this set is frozen for that particular tree, does the tree behave like a regular decision tree algorithm"

No

" I am assuming that random forest is nothing but generating a bunch of classical 'decision trees' and taking their votes towards the final classification. But in many places whatever description I have read seems to suggest that; for a given decision tree within the forest even at each node we randomly select variables. Is that the case?"

Yes

"Does it mean that at each node in the tree, we randomly select m variables from the variable set which is fixed for that tree?

This is slightly confusing, is this assuming that there is another bigger subset reserved for that tree of which m can be picked ? If not this assumption, I think this is essentially asking if the tree is grown with the same set of randomly selected features for each node and the answer is NO.

In Random Forest, randomization of features takes place for each node. So if there are 100 predictors in total, for each node in a tree a subset of 10 (say) is chosen randomly and evaluated for best split. Note that the number of trees in each node is kept constant during the entire process of growing the tree.

Hope this helps.