Return variable importance for each iteration in caret package in R

Question

I'm running a random forest model using R's caret package, and running varImp on the returned object gives me the averaged variable importance across the number of bootstrap iterations. However, I would rather assess variable importance for each iteration. Is this possible using the caret package?

Reproducible example:

library(caret)
mod <- train(Species ~ ., data = iris,
         method = "cforest",
         controls = cforest_unbiased(ntree = 10))
varImp(mod)

returns:

cforest variable importance
Overall
Petal.Width  100.0000
Petal.Length  86.6279
Sepal.Length   0.5814
Sepal.Width    0.0000

what I'm interested in is rather a list of length=number of bootstrap resamples with variable importance for each iteration. This might be possible using some combination of returnResamp = "all" and a custom summaryFunction but I'm not wise enough to know.

topepo topepo · Accepted Answer · 2014-07-21T17:35:52

Which bootstrapping iterations do you mean? The ones used internally by cforest or the resampling done by train?

train returns the importance scores produced by the final model object (which may not be the same as "averaged variable importance across the number of bootstrap iterations" , depending on your answer to the first question.)

If you want to get the resampled importance scores over train's resampling, you can trick rfe into doing it. For example:

set.seed(1)
mod <- rfe(x = iris[, 1:4], y = iris$Species, sizes = 4,
           rfeControl = rfeControl(functions = caretFuncs, 
                                   method = "boot",
                                   number = 5),
           ## pass options to train(), 
           tuneGrid = data.frame(mtry = 2),
           method = "cforest",
           controls = cforest_unbiased(ntree = 10))

Then the importance scores for each iteration are in mod$variables.

Max

Return variable importance for each iteration in caret package in R

2 Answers