2
votes

I'm running MPI jobs on a SLURM cluster and want to pin the resultant processes to specific cores on the node. This can include different numbers of processes on different nodes and different pinning patterns on different nodes. This is all relatively easy if I take an allocation of nodes using salloc, and construct a rank file for the mpi processes and start the processes using mpirun.

Unfortunately if the allocation is revoked for any reason such at timing out or pre-emption, the processes running on the nodes are killed, but the mpirun that is executing on the head node is not killed, and goes from generating negligible load to generating 100% load for the thread it is executing in indefinitely.

It would appear that the answer is to use srun instead of mpirun to launch the application, but I am struggling to find out how to set the process placements if I do this. Anyone have any suggestions?

1
Any reason why you use salloc rather than sbatch ? Is the program interactive? Otherwise, with sbatch your main mpirun process will be killed.damienfrancois
Because of the nature of the experiments I am running I allocate a large number of machines using salloc and then use srun to start of specific jobs within this allocation. I want to ensure I have all the required machines before starting anything hence using salloc.Daniel Goodman

1 Answers

0
votes

I'm no slurm expert and without knowing the specifics of what you're trying to do, we can't give you a specific answer. However, what you probably want is found in the SLURM documentation:

https://computing.llnl.gov/linux/slurm/mc_support.html

That has all sorts of documentation about a million different ways to bind to core, socket, whatever. You probably want to use --cpu-bind with map_cpu if you want to specifically bind processes to individual cores.

There may also be documentation available for your specific system about how to do it on your machine. For instance, on the supercomputer at Argonne National Laboratory, you can find information on their site on how to do it specifically for their IBM BG/Q:

http://www.alcf.anl.gov/user-guides/running-jobs#mapping-of-mpi-tasks-to-cores