0
votes

I'm using doSnow and foreach to send a bunch of messages in parallel, but they are cluttering my logs making them difficult to parse. The simple example below:

library(foreach)
library(doSNOW)

numbers <- 1:6
cool_print <- function(x) print(paste0(x, " is cool"))
cl <- makeCluster(2, outfile="") # "" passes messages to standard out 
registerDoSNOW(cl)
sns_responses <-
    foreach(
      x        = numbers
      ) %dopar% {
        cool_print(x)
      }
 stopCluster(cl)

Outputs the following:

Type: EXEC 
Type: EXEC 
Type: EXEC 
Type: EXEC 
[1] "1 is cool"
[1] "2 is cool"
Type: EXEC 
[1] "3 is cool"
Type: EXEC 
[1] "4 is cool"
Type: EXEC 
[1] "5 is cool"
Type: EXEC 
[1] "6 is cool"
>  stopCluster(cl)
Type: DONE 
Type: DONE 

I just want it to print all of the " is cool" statements.

Can you do this? Is there an option I can set in Snow somewhere that allows printing, but not this other stuff?

1
It doesn't seem to be possible - the message is printed unconditionally by a function in the snow package (source code).count orlok
Not that it makes a difference here, but if you don't have specific reasons for using the 'snow' package, I recommend that you move to the 'parallel' package instead. The 'snow' package was basically incorporated as-is into the 'parallel' in R 2.14.0, and 'snow' can more or less be considered "deprecated" these days.HenrikB

1 Answers

1
votes

See my comment Why don't parallel jobs print in RStudio? for why using outfile = "" is most likely not a solution - it's more of a hack that doesn't really work.

Below is a solution using the future framework (disclaimer: I'm the author) that resembles your example as close as possible and at the same time relays the output produced on the workers to the main R process:

library(foreach)
library(doFuture)
registerDoFuture()

cl <- parallel::makeCluster(2)
plan(cluster, workers=cl)

cool_print <- function(x) print(paste0(x, " is cool"))

numbers <- 1:6
sns_responses <- foreach(x=numbers) %dopar% {
  cool_print(x)
}

parallel::stopCluster(cl)

Note that the output is relayed only when a worker is finished with its task and always the in order as you would when running it sequentially, so there's a delay. If you're after progress updates, you can use the progressr package (disclaimer: I'm also the author here), e.g.

library(foreach)
library(doFuture)
library(progressr)
registerDoFuture()

cl <- parallel::makeCluster(2)
plan(cluster, workers=cl)

cool_print <- function(x) print(paste0(x, " is cool"))

numbers <- 1:6

with_progress({
  p <- progressor(along=numbers)             
  sns_responses <- foreach(x=numbers) %dopar% {
    p(paste0(x, " is cool"))
    cool_print(x)
  }
})

parallel::stopCluster(cl)