0
votes

This is basically the same question as this How to set .libPaths (checkpoint) on workers when running parallel computation in R, but now addressing parallelization of mlr model fits. I understand that I need to use parallelMap with mlr, but how can I make sure that each worker uses the correct .libPaths?

remove.packages("mlr")
remove.packages("rpart")

checkpoint::checkpoint("2018-09-01",
                       scanForPackages = TRUE)

library(mlr)
library(parallelMap)
library(rpart)
parallelStartSocket(2L)

task = makeClassifTask(data = iris, target = "Species")
learner = makeLearner("classif.rpart", minsplit = 7, predict.type = "prob")
mod = resample(learner, task, resampling = cv5)

parallelStop()

Error in parallelLibrary("mlr", master = FALSE, level = "mlr.resample", : Packages could not be loaded on all slaves: mlr.

Session info:

R version 3.5.1 (2018-07-02) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 [4] LC_NUMERIC=C LC_TIME=German_Germany.1252

attached base packages: [1] stats graphics grDevices utils
datasets methods base

other attached packages: [1] rpart_4.1-13 parallelMap_1.3
mlr_2.13 ParamHelpers_1.11

loaded via a namespace (and not attached): [1] Rcpp_0.12.18
pillar_1.3.0 compiler_3.5.1 plyr_1.8.4 bindr_0.1.1
tools_3.5.1 [7] tibble_1.4.2 gtable_0.2.0
checkmate_1.8.5 lattice_0.20-35 pkgconfig_2.0.2 rlang_0.2.2
[13] Matrix_1.2-14 fastmatch_1.1-0 rstudioapi_0.7 yaml_2.2.0 parallel_3.5.1 bindrcpp_0.2.2 [19] dplyr_0.7.6 grid_3.5.1 tidyselect_0.2.4 glue_1.3.0 data.table_1.11.4 R6_2.2.2
[25] XML_3.98-1.16 survival_2.42-3 ggplot2_3.0.0 purrr_0.2.5 magrittr_1.5 backports_1.1.2 [31] scales_1.0.0 BBmisc_1.11 splines_3.5.1 assertthat_0.2.0 checkpoint_0.4.3 colorspace_1.3-2 [37] stringi_1.1.7 lazyeval_0.2.1 munsell_0.5.0
crayon_1.3.4

1
The code you've posted works fine for me.Lars Kotthoff
Ok, let me explain this in more detail. checkpoint changes the path where R searches for installed packages. I have installed the packages in the checkpoint folder, so the code will run fine if not parallelized. But when the code is run in parallel fresh R sessions are started where .libPaths is not the checkpoint path, but the default library path. So it looks in a different folder for R packages where I don't have these packages installed, so it raises an error. If you remove mlr from your default libPaths folder you should see the same error. Hope that makes it clearneedRhelp
Can you post code that allows to reproduce the error please?Lars Kotthoff
The code is reproducible. I have added the session info in case that it is something platform dependent.needRhelp
I have run the code you've posted and I do not get the error.Lars Kotthoff

1 Answers

1
votes

It works for me if I change the default library path to the checkpoint directory with R_LIBS.

remove.packages("mlr")
remove.packages("rpart")

print(.libPaths())
checkpoint::checkpoint("2018-09-01",
                       scanForPackages = TRUE)
print(.libPaths())
Sys.setenv(R_LIBS = paste(.libPaths()[1], Sys.getenv("R_LIBS"), sep = .Platform$path.sep))

library(mlr)
library(parallelMap)
library(rpart)
parallelStartSocket(2L)

task = makeClassifTask(data = iris, target = "Species")
learner = makeLearner("classif.rpart", minsplit = 7, predict.type = "prob")
mod = resample(learner, task, resampling = cv5)

parallelStop()