2
votes

I'm running a random forest model using R's caret package, and running varImp on the returned object gives me the averaged variable importance across the number of bootstrap iterations. However, I would rather assess variable importance for each iteration. Is this possible using the caret package?

Reproducible example:

library(caret)
mod <- train(Species ~ ., data = iris,
         method = "cforest",
         controls = cforest_unbiased(ntree = 10))
varImp(mod)

returns:

cforest variable importance
Overall
Petal.Width  100.0000
Petal.Length  86.6279
Sepal.Length   0.5814
Sepal.Width    0.0000 

what I'm interested in is rather a list of length=number of bootstrap resamples with variable importance for each iteration. This might be possible using some combination of returnResamp = "all" and a custom summaryFunction but I'm not wise enough to know.

2

2 Answers

2
votes

Which bootstrapping iterations do you mean? The ones used internally by cforest or the resampling done by train?

train returns the importance scores produced by the final model object (which may not be the same as "averaged variable importance across the number of bootstrap iterations" , depending on your answer to the first question.)

If you want to get the resampled importance scores over train's resampling, you can trick rfe into doing it. For example:

set.seed(1)
mod <- rfe(x = iris[, 1:4], y = iris$Species, sizes = 4,
           rfeControl = rfeControl(functions = caretFuncs, 
                                   method = "boot",
                                   number = 5),
           ## pass options to train(), 
           tuneGrid = data.frame(mtry = 2),
           method = "cforest",
           controls = cforest_unbiased(ntree = 10))

Then the importance scores for each iteration are in mod$variables.

Max

0
votes

After some digging around, i came up with with

getvarimp <- function(x) {
    stopifnot(is(x, "train") & is(x$finalModel, "RandomForest"))
    vi<-party:::varimp
    body(vi)[[length(body(vi))]]<-quote(return(perror))
    vi(x$finalModel)
}

getvarimp(mod)

At least for this object type, this seems to be how varImp calucluates it's return value. Specifically, it takes the column means and rescales

vi <- colMeans(getvarimp(mod))
(vi-min(vi)) / max(vi)*100

Note that each time you run this (or varImp) you may get a slightly different result because it uses some stochastic prediction each time it's run.

There may very well be other ways but I was unable to find any.