
I'm trying to run recursive feature elimination for a random forest on a data frame containing 27 predictor variables, each with 3653 values. So there's 98631 values total in the predictor dataframe. I'm using the rfe function from the package caret.


subsets <- c(1:5, 10, 15, 20, 25)


ctrl <- rfeControl(functions = rfFuncs,
                   method = "repeatedcv",
                   repeats = 5,
                   verbose = FALSE,

rfProfile <- rfe(predictors, 
                 sizes = subsets,
                 rfeControl = ctrl)

I'm using allowParallel=TRUE in rfeControl, hoping that it will run the process in parallel on my Windows machine. But I'm not sure if it's doing that, since I do not see any decrease in run time after setting allowParallel=TRUE. The process takes a very long time, and I've had to interrupt the kernal after 1-2 hours each time.

How do I know if caret is running the RFE in parallel? Do I need to install any other parallelization packages for caret to run this process in parallel?

Any help/suggestions will be much appreciated! I'm new to the machine learning world, so it's taking me a while to figure things out.

Not a caret user, but I guess the implementation use forking which is not supported by windows. For your size of data set that should take few minutes unless also coupled with parameter tuning. A stand-alone-function using foreach and doParallel (supported by windows) could be written in ~25 linesSoren Havelund Welling
I must say, without any intent to contribute a meaningful response, that your question title was very intriguing. (And I have a nagging fear that pkg:caret is a disguised multiple comparisons engine.)IRTFM
If you're on Windows, you can always pull up task manager and see if all of your processors are actually working. I don't think this is a robust solution, but it at least gives you an idea.Alex W
Thank you, @SorenH.Welling. I'm a R newbie, so the task of writing parallel code seems slightly daunting. But I will give it a go sometime soon!small_world
@BondedDust Again, not adding anything to the response, might I ask you what a multiple comparison engine is? I tried searching for the meaning, but couldn't find a reliable description.small_world

1 Answers


Try installing and registering the doParallel package prior to running rfe. This seemed to work on my Windows machine.

Here's a lengthy example pulled from the caret documentation with timing before and after using doParallel

subsetSizes <- c(2, 4, 6, 8)
seeds <- vector(mode = "list", length = 51)
for(i in 1:50) seeds[[i]] <- sample.int(1000, length(subsetSizes) + 1)
seeds[[51]] <- sample.int(1000, 1)


Run without parallel processing

system.time(rfMod <- rfe(bbbDescr, logBBB,
         sizes = subsetSizes,
         rfeControl = rfeControl(functions = rfFuncs, 
                                 seeds = seeds,
                                 number = 50)))

   user  system elapsed 
 113.32    0.44  114.43 

Register parallel

cl <- makeCluster(detectCores(), type='PSOCK')

Run with parallel processing

system.time(rfMod <- rfe(bbbDescr, logBBB,
         sizes = subsetSizes,
         rfeControl = rfeControl(functions = rfFuncs, 
                                 seeds = seeds,
                                 number = 50)))

   user  system elapsed 
   1.57    0.01   56.27