
I was looking for an alternative to furrr:future_map() because when this function is run inside another function it copies all objects defined inside that function to each worker regardless of whether those objects are explicitly passed (https://github.com/DavisVaughan/furrr/issues/26).

It looks like parLapply() does the same thing when using clusterExport():

fun <- function(x) {
  big_obj <- 1
  cl <- parallel::makeCluster(2)
  parallel::clusterExport(cl, c("x"), envir = environment())
  parallel::parLapply(cl, c(1), function(x) {
    x + 1
    env <- environment()
    parent_env <- parent.env(env)
    return(list(this_env = env, parent_env = parent_env))

res <- fun(1)
#> [1] "cl"      "big_obj" "x"

Created on 2020-01-06 by the reprex package (v0.3.0)

How can I keep big_obj from getting copied to each worker? I am using a Windows machine so forking isn't an option.

On windows, you have to copy the data. The only way to not copy the data is to not have any data at all. Meaning, store it on disk and load only a subset to work on.F. Privé
I came across this post: stackoverflow.com/questions/35851761/…. It seems the issue I describe has to do with defining the worker function inside another function instead of the global environment.Giovanni Colitti

1 Answers


You can change the environment of your local function so that it does not include big_obj by assigning e.g. only the base environment.

fun <- function(x) {
  big_obj <- 1
  cl <- parallel::makeCluster(2)
  on.exit(parallel::stopCluster(cl), add = TRUE)
  parallel::clusterExport(cl, c("x"), envir = environment())
  local_fun <- function(x) {
    x + 1
    env <- environment()
    parent_env <- parent.env(env)
    return(list(this_env = env, parent_env = parent_env))
  environment(local_fun) <- baseenv()
  parallel::parLapply(cl, c(1), local_fun)
res <- fun(1)
"big_obj" %in% names(res[[1]]$parent_env) # FALSE