3
votes

I am currently using the parallel package in R and I am trying to make by work reproducible by setting seeds.

However, if you set the seed before creating the cluster and performing the tasks you want in parallel, for some reason, it doesn't make it reproducible. I think I need to set the seed for each core when I make the cluster.

I have made a small example here to illustrate my problem:

library(parallel)

# function to generate 2 uniform random numbers
runif_parallel <- function() {
  # make cluster of two cores
  cl <- parallel::makeCluster(2)

  # sample uniform random numbers
  samples <- parallel::parLapplyLB(cl, X = 1:2, fun = function(i) runif(1))

  # close cluster
  parallel::stopCluster(cl)

  return(unlist(samples))
}

set.seed(41)
test1 <- runif_parallel()

set.seed(41)
test2 <- runif_parallel()

# they should be the same since they have the same seed
identical(test1, test2)

In this example, the test1 and test2 should be the same, as they have the same seed, but they return different results.

Can I get some help with where I'm going wrong please?

Note that I've written this example the way I have to mimic how I'm using it right now - there are probably cleaner ways to generate two random uniform numbers in parallel.

1
set seed inside function(i)Not_Dave

1 Answers

4
votes

You need to run set.seed within each job. Here is a reproducable random generation:

cl <- parallel::makeCluster(2)

# sample uniform random numbers
parallel::clusterEvalQ(cl, set.seed(41));

samples <- parallel::parLapplyLB(cl, X = 1:2, fun = function(i){set.seed(i);runif(1)})
samples

# [[1]]
# [1] 0.2655087
# 
# [[2]]
# [1] 0.1848823

samples <- parallel::parLapplyLB(cl, X = 1:2, fun = function(i){set.seed(i);runif(1)})
samples

# [[1]]
# [1] 0.2655087
# 
# [[2]]
# [1] 0.1848823

parallel::stopCluster(cl)