This is basically a follow up to the this more specialized question. There have been some posts about the creation of zombie processes when doing parallel computing in R
:
- How to stop R from leaving zombie processes behind
- How to kill a doMC worker when it's done?
- Remove zombie processes using parallel package
There are several ways of doing parallel computing and I will focus on the three ways that I have used so far on a local machine. I used doMC
and doParallel
with the foreach
package on a local computer with 4
cores:
(a) Registering a fork cluster:
library(doParallel)
cl <- makeForkCluster(4)
# equivalently here: cl <- makeForkCluster(nnodes=getOption("mc.cores", 4L))
registerDoParallel(cl)
out <- foreach(i=1:1000, .combine = "c") %dopar% {
print(i)
}
stopCluster(cl)
(b) Registering a PSOCK cluster:
library(doParallel)
cl <- makePSOCKcluster(4)
registerDoParallel(cl)
out <- foreach(i=1:1000, .combine = "c") %dopar% {
print(i)
}
stopCluster(cl)
(c) Using doMC
library(doMC)
library(doParallel)
registerDoMC(4)
out <- foreach(i=1:1000, .combine = "c") %dopar% {
print(i)
}
Several users have observed that when using the doMC
method -- which is just a wrapper for the mclapply
function so its not doMC
s fault (see here: How to kill a doMC worker when it's done?) -- leaves zombie processes behind. In an answer to a previous question (How to stop R from leaving zombie processes behind) it was suggested that using a fork cluster might not leave zombie processes behind. In another question it was suggested (Remove zombie processes using parallel package) that using a PSOCK cluster might not leave zombie processes behind. However, it seems that all three methods leave zombie process behind. While zombie processes per se are usually not a problem because they do (normally) not bind resources they clutter the process tree. Still I might get rid of them by closing and re-opening R
but that is not the best option when I'm in the middle of a session. Is there an explanation why this happens (or even: is there a reason why this has to happen)? And is there something to be done so that no zombie processes are left behind?
My system info (R
is used in a simple repl
session with xterm
and tmux
):
library(devtools)
> session_info()
Session info-------------------------------------------------------------------
setting value
version R Under development (unstable) (2014-08-16 r66404)
system x86_64, linux-gnu
ui X11
language (EN)
collate en_IE.UTF-8
tz <NA>
Packages-----------------------------------------------------------------------
package * version source
codetools 0.2.8 CRAN (R 3.2.0)
devtools * 1.5.0.99 Github (c429ae2)
digest 0.6.4 CRAN (R 3.2.0)
doMC * 1.3.3 CRAN (R 3.2.0)
evaluate 0.5.5 CRAN (R 3.2.0)
foreach * 1.4.2 CRAN (R 3.2.0)
httr 0.4 CRAN (R 3.2.0)
iterators * 1.0.7 CRAN (R 3.2.0)
memoise 0.2.1 CRAN (R 3.2.0)
RCurl 1.95.4.3 CRAN (R 3.2.0)
rstudioapi 0.1 CRAN (R 3.2.0)
stringr 0.6.2 CRAN (R 3.2.0)
whisker 0.3.2 CRAN (R 3.2.0)
Small edit: At least for makeForkCluster()
it seems that sometimes the forks it spawns get killed and reaped by the parent correctly and sometimes they do not get reaped and become zombies. It seems this only happens when the cluster is not closed fast enough after the loop is aborted or finished; at least that is when it happened the last few times.