3
votes

I'm working on an R package, and I need to run a function myfun on a cluster using parallel::parLapply. myfun calls several additional functions from my package, which in turn call more functions, some of which have multiple methods... so passing all of the functions and methods to the cluster explicitly by name is very cumbersome.

The standard advice, as I understand it, is to run parallel::clusterEvalQ({library("my_package")}). But the call to library("my_package") is apparently anathema to R-CMD-check. And I have reason to believe that my-package:::function won't fly on CRAN either.

What is the standard approach here? Do I need to export every single relevant function and method by name?

1
The internal function may not avaiable for user of the package but could be call by any function inside package easily. Have you actually test the code with parallel?Sinh Nguyen
@SinhNguyen the question is how to export the function to the cluster. I have of course tested the code, but I don't really know what I'm doing since I mostly work on Mac and do multicore stuff by forking.Jacob Socolar
@SinhNguyen Note also that the challenge isn't getting a working implementation locally; the challenge is getting an implementation that passes R-CMD-check.Jacob Socolar
The R CMD check thing is only really important if you want to submit your package on CRAN, it's not like the world will end just because you get a warningHong Ooi
I do want my package on CRAN :)Jacob Socolar

1 Answers

0
votes

Ok, this seems to work, (it passes R-CMD-check on GitHub):

parallel::clusterExport(cl = cl, 
                        unclass(lsf.str(envir = asNamespace("my_package"), 
                                        all = T)),
                        envir = as.environment(asNamespace("my_package"))
                        )

Hope it's useful to others.

There's also probably a nifty solution available via the globals package, but I haven't been able to get this to pass checks on GitHub actions.