2
votes

I want to create a function that uses assignments to store intermediate output (p). This intermediate output is used in statements below. I want everything to be parallelized using doSNOW and foreach and I do NOT want that intermediate output to be communicated between iteration of the forearch loop. I don't want to store intermediate output in a list (e.g. p[[i]]) because then I have to change a huge amount of code.

  • Question 1: Is there any danger that another iteration of the foreach loop will use the intermediate output (p)?
  • Question 2: If yes, when would there be danger of that happening and how to prevent it?

Here is an example of what I mean:

install.packages('foreach')
library('foreach')

install.packages('doSNOW')
library('doSNOW')

NbrCores <- 4
cl<-makeCluster(NbrCores) 
registerDoSNOW(cl)

test <- function(value){
   foreach(i=1:500) %dopar% {
      #some statement based on parameter 'value'
      p <- value
      #some statement that uses p
      v <- p
      #other statements
  }
}

test(value=1)
1
Usually the request is the opposite, to establish some sort of communication, which is hard because isolation of results is usually enforced.IRTFM
@DWin that is indeed often the request. Just to be clear; I want no communication at all.user1134616
I don't think there is any danger. The cpu-process should not share memory and R does not change variables in-place, anyway. If you want other more informed opinions, the right place to pose the question is the R High Performance Computing SIG mailing list.IRTFM

1 Answers

0
votes

Each of the nodes used in parallel computations runs in its own R process I believe. Therefore there is no risk of variables from one node influencing the results in another. In general, there is a possibility to communicate between the processes. However foreach only iterates over the sequence it is given, executing each item in the sequence in one of the nodes independently.