Recently I was learning about using mlr3 package with parallelization. As the introduction from mlr3 book (https://mlr3book.mlr-org.com/technical.html) and tutorial(https://www.youtube.com/watch?v=T43hO2o_nZw&t=1s), mlr3 uses the future backends for parallelization. I run a simple test with the following code:
# load the packages
library(future)
library(future.apply)
library(mlr3)
# set the task
task_train <- TaskClassif$new(id = "survey_train", backend = train, target = "r_yn", positive = "yes")
# set the learner
learner_ranger <- mlr_learners$get("classif.ranger")
# set the cv
cv_5 <- rsmp("cv", folds = 5)
# run the resampling in parallelization
plan(multisession, workers = 5)
task_train_cv_5_par <- resample(task = task_train, learner = learner_ranger, resampling = cv_5)
plan(sequential)
task_train_cv_5_par$aggregate(msr("classif.auc"))
The AUC changes every time, and I know that because I do not set the random seed for parallelization. But I have found many tutorials about future packages, the way to get a reproducible result with future is using future_lapply from future.apply package and set future.seed = TRUE. The other way is something like setting future backend for foreach loop using %dorng% or registerDoRNG().
My question is how can I get a reproducible resampling result in mlr3 without using future_lapply or foreach? I guess there may be a simple way to get that. Thanks a lot!