I'm trying to run recursive feature elimination for a random forest on a data frame containing 27 predictor variables, each with 3653 values. So there's 98631 values total in the predictor dataframe. I'm using the rfe function from the package caret.
require(caret)
require(randomForest)
subsets <- c(1:5, 10, 15, 20, 25)
set.seed(10)
ctrl <- rfeControl(functions = rfFuncs,
method = "repeatedcv",
repeats = 5,
verbose = FALSE,
allowParallel=TRUE)
rfProfile <- rfe(predictors,
y,
sizes = subsets,
rfeControl = ctrl)
I'm using allowParallel=TRUE in rfeControl, hoping that it will run the process in parallel on my Windows machine. But I'm not sure if it's doing that, since I do not see any decrease in run time after setting allowParallel=TRUE. The process takes a very long time, and I've had to interrupt the kernal after 1-2 hours each time.
How do I know if caret is running the RFE in parallel? Do I need to install any other parallelization packages for caret to run this process in parallel?
Any help/suggestions will be much appreciated! I'm new to the machine learning world, so it's taking me a while to figure things out.