I would like to make use of the 20 CPU cores I have at hand to train random forests in R. My usual code using randomForest package would be this:
rf = randomForest(Pred~., train, ntree=100, importance=TRUE)
rf
So I train a forest with 100 trees using a factor Pred
with 11 levels and a dataframe train
with 74 numeric features and ~84k observations.
The idea was to speed this up by using caret with my code (derived from this example):
cluster <- makeCluster(19)
registerDoParallel(cluster)
trainctrl <- trainControl(method="none", number=1, allowParallel=TRUE)
fit <- train(Driver~., train, method="parRF", trControl=trainctrl, ntree=100)
stopCluster(cluster)
registerDoSEQ()
fit
I replaced method=cv
from the example with method=none
as I do want to train on the whole training set (see documentation). However I do not get an accuracy from fit
, fit$results
is empty. If I set method=oob
an optimization of mtry
is done, which also gives me accuracies.
Is there a way to simply run the first code snippet in parallel using caret without any hyperparameter optimizations?
ranger
package. That runs parallel out of the box.method = "ranger"
in caret. – phiver