1
votes

I appreciate bagging randomly resamples the training set for each tree, and random forests randomly select a subset of features for each tree.

My question though is does a random forest also resample the training set as well as taking a random subset of features. Is it in effect double random?

1

1 Answers

5
votes

The answer is yes, most of the times, if you want to.

Random forests bootstrap the data and randomly select features. bootstrapping means that it samples a data-set with the same size as the original dataset, but with replacement. So if you have N data points, each tree will use N data points, but some my be duplicated (as it samples them one by one with replacement).

However, it really is up to you what you do. In the sklearn implementation, the default is to bootstrap but you can flag bootstarp=False, and then you only have the random features selection. See the documentation here: http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html