0
votes

I fitted a random forest model. I have used both randomForest and ranger package. I didn't tune number of trees in a forest, I just left it with default number, which is 500. Now I would like to see if it is enough, i.e. if error have reached a plateau or not. So I believe I would need to extract the individual trees, take at random for example 100, 200, 300, 400 and finally 500 trees, take oob trees out of them and calculate the OOB error for 100, 200, ... trees consecutively. Then I could plot OOB error vs the number of trees. I found that randomForest::getTree and ranger::treeInfo returns a data.frame of trees, but I can't figure out what is what there. What's more ranger::treeInfo returns a data frame with 50% NAs and the overall output is even harder to read. So my question are:

  1. How can I plot the OOB error vs number of trees used in a forest if I already have a random forest with 500 trees?
  2. Why ranger::treeInfo have 50% NAs and actually only those rows have predictions?

Here's minimal example:

mpg2 <- mpg %>% 
  mutate(is_suv = as.factor(class == 'suv')) %>% 
  select(-class)
mpg_model <- ranger::ranger(is_suv ~ ., data = mpg2)
ranger::treeInfo(mpg_model, tree = 100)
1

1 Answers

1
votes

I think what you're looking for is just plot(.) , as in this example :

library(randomForest)

set.seed(71)
iris.rf <- randomForest(Species ~ ., data=iris, importance = TRUE, proximity=TRUE)
# plot the model
plot(iris.rf)
# add legend to know which is which
legend("top", colnames(iris.rf$err.rate), fill=1:ncol(iris.rf$err.rate))

As for the randomForest::getTree and ranger::treeInfo, those have nothing to do with the OOB and they simply describe an outline of the -chosen- tree, i.e., which nodes are on which criteria splitted and to which nodes is connected, each package uses a slightly different representation, the following for example comes from ranger::treeInfo:

     nodeID leftChild rightChild splitvarID splitvarName splitval terminal prediction
1       0         1          2          4  Petal.Width     0.80    FALSE       <NA>
2       1        NA         NA         NA         <NA>       NA     TRUE     setosa

which is basically a description of something like this:

enter image description here