8
votes

The following (simplified) script works fine on the master node of a unix cluster (4 virtual cores).

library(foreach)
library(doParallel)

nc = detectCores()
cl = makeCluster(nc)
registerDoParallel(cl)

foreach(i = 1:nrow(data_frame_1), .packages = c("package_1","package_2"), .export = c("variable_1","variable_2"))  %dopar% {     

    row_temp = data_frame_1[i,]
    function(argument_1 = row_temp, argument_2 = variable_1, argument_3 = variable_2)

}

stopCluster(cl)

I would like to take advantage of the 16 nodes in the cluster (16 * 4 virtual cores in total).

I guess all I need to do is change the parallel backend specified by makeCluster. But how should I do that? The documentation is not very clear.

Based on this quite old (2013) post http://www.r-bloggers.com/the-wonders-of-foreach/ it seems that I should change the default type (sock or MPI - which one- would that work on unix?)

EDIT

From this vignette by the authors of foreach:

By default, doParallel uses multicore functionality on Unix-like systems and snow functionality on Windows. Note that the multicore functionality only runs tasks on a single computer, not a cluster of computers. However, you can use the snow functionality to execute on a cluster, using Unix-like operating systems, Windows, or even a combination.

What does you can use the snow functionality mean? How should I do that?

2
I am not using a for loop...Antoine
I would recommend downloading Revolution R open, which has improved statistical libraries and multi-core support. check my answer on this post for more information about RRO. This is not directly related to your question, but it will speed up any mathematical calculations and package which can use multi cores.Bas

2 Answers

9
votes

The parallel package is a merger of the multicore and snow packages, but if you want to run on multiple nodes, you have to make use of the "snow functionality" in parallel (that is, the part of parallel that was derived from snow). Practically speaking, that means you need to call makeCluster with the "type" argument set to either "PSOCK", "SOCK", "MPI" or "NWS" because those are the only cluster types supported by the current version of parallel that support execution on multiple nodes. If you're using a cluster that is managed by knowledgeable HPC sysadmins, you should use "MPI", otherwise it may be easier to use "PSOCK" (or "SOCK" if you have a particular reason to use the "snow" package).

If you choose to create an "MPI" cluster, you should execute the script via R using the mpirun command with the "-n 1" option, and the first argument to makeCluster set to the number of workers that should be spawned. (If you don't know what that means, you may not want to use this approach.)

If you choose to create a "PSOCK" or "SOCK" cluster, the first argument to makeCluster must be a vector of hostnames, and makeCluster will start workers on those nodes via the "ssh" command when makeCluster is executed. That means you must have ssh daemons running on all of the specified hosts.

I've written much more on this subject elsewhere, but hopefully this will help you get started.

1
votes

Here's a partial answer that may send you in the right direction

Based on this quite old (2013) post http://www.r-bloggers.com/the-wonders-of-foreach/ it seems that I should change the default type (fork to MPI but why? would that work on unix?)

fork is a way of spawning background processes on POSIX system. on a single node with n cores, you can spawn n processes in parallel and do work. this doesn't work across multiple machines as they don't share memory. you need a way to get data between them.

MPI is a portable way to communicate between clusters of nodes. An MPI cluster can work across nodes.

What does you can use the snow functionality mean? How should I do that?

snow is a separate package. To make a 16 node MPI cluster with snow, do cl <- makeCluster(16, type = "MPI") but you need to be running R in the right environment, as described Steve Weston's answer and in his answer to a similar question here. (Once you get it running you may also need to modify your loop to use 4 cores on each node.)