I am running a process in parallel using the doParallel/Foreach backend in R. I'm registering a set of 20 cores as a cluster, and running the process about 100 times. I'm passing a matrix to each iteration of the parallel processes, and in the sub-process I replace the matrix with a random sample of its own rows. What I'm wondering is: should I expect that this modification persists for subsequent iterations handled by the same child process? E.g., when child process 1 finishes its first iteration, does it start the second iteration with the original matrix, or the random sample?
A minimal example:
library(doParallel)
X <- matrix(1:400, ncol=4)
cl<-makeCluster(2)
clusterExport(X)
registerDoParallel(cl)
results<-foreach(i=1:100) %dopar% {
set.seed(12345)
X <- X[sample.int(nrow(X),replace=TRUE),]
X
}
EDIT:
To be clear, if indeed the object will persist across iterations by the same worker process, this is not my desired behavior. Rather, I want to have each iteration take a fresh random sample of the original matrix, not a random sample of the most recent random sample (I recognize that in my minimal example it would moreover create the same random sample of the original matrix each time, due to the seed set--in my actual application I deal with this).