1
votes

I want to do a distributed parallel computation with a cluster I have access to: 5 nodes ("computers"); each node has two processors ("CPUs"), and each processor has 18 cores.

So, the number of threads I could use, in an embarrassingly parallel computation is 180 (5*2*18).

I have found out I cannot use standard parallel R functions for a cluster. Instead, I need to use MPI. doMPI seems ideal for this task, since it implements a foreach for MPI, as described in the vignette:

https://cran.r-project.org/web/packages/doMPI/vignettes/doMPI.pdf

I have a question about MPI: when one writes, for example:

cl <- startMPIcluster(count=2)

what does this 2 mean? The number of nodes in the cluster to be used? The number of cores to be used?

If 2 represents the number of nodes to be used, will doMPI be able to use the 2 processors, and the 2*18 cores, which are in each node? Or do I have to tell doMPI something else, such that I can use these 36 cores?

If 2 represents the number of cores, then everything seems easier. But that choice would be odd, because if the cluster is in fact bigger than 5 nodes (and I have been allocated 5 nodes on a pro-rata quota), there is no clear recipe if doMPI should use as little nodes as possible (and all cores within that node) or use as many nodes as possible (and as fewer cores within that node as possible).

So, my question is then:

If I want to do a loop of 180 embarrassingly parallel tasks (or 360, or 1800), should I use cl <- startMPIcluster(count=5) or cl <- startMPIcluster(count=180) or something else, such that the 180 available cores are being used?

Thank you for your help.

1

1 Answers

1
votes

The count parameter is "the number of workers to spawn." If you want to use all 180 cores in your cluster, you have two main options:

  1. Use startMPIcluster(count=180). This will spawn 180 processes.
  2. Use mpirun -np 180 R myscript.r. This will launch 180 instances of R with MPI set up from the beginning, namely the MPI "size" will be 180 and the "rank" of the processes will be 0 through 179.

Either of these options is fine. You could blend them too, e.g. mpirun -np 10 then have each job spawn with count=15 or whatever. But given what you've told us so far, I'd say you should stick with the simpler approaches above.

As a general note, whenever MPI talks about numbers of processes or workers or jobs, one of those is executed on one core. Usually the number of nodes or sockets per node are not the first things you need to worry about (they might be worth considering later as optimizations).