I'm having issues with OpenMPI where different MPI ranks are being repeatably bound to the same CPU cores.
I'm using a server with 32 hardware cores (no hyper-threading), Ubuntu 14.04.2 LTS and OpenMPI 1.8.4, compiled with Intel compiler 15.0.1.
For instance, I can run my executable with 8 MPI ranks, and get the following rank to core bindings,
$ mpirun -n 8 --report-bindings ./executable
[simple:16778] MCW rank 4 bound to socket 0[core 1[hwt 0]]: [./B/./././././.][./././././././.][./././././././.][./././././././.]
[simple:16778] MCW rank 5 bound to socket 1[core 9[hwt 0]]: [./././././././.][./B/./././././.][./././././././.][./././././././.]
[simple:16778] MCW rank 6 bound to socket 2[core 17[hwt 0]]: [./././././././.][./././././././.][./B/./././././.][./././././././.]
[simple:16778] MCW rank 7 bound to socket 3[core 25[hwt 0]]: [./././././././.][./././././././.][./././././././.][./B/./././././.]
[simple:16778] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././.][./././././././.][./././././././.][./././././././.]
[simple:16778] MCW rank 1 bound to socket 1[core 8[hwt 0]]: [./././././././.][B/././././././.][./././././././.][./././././././.]
[simple:16778] MCW rank 2 bound to socket 2[core 16[hwt 0]]: [./././././././.][./././././././.][B/././././././.][./././././././.]
[simple:16778] MCW rank 3 bound to socket 3[core 24[hwt 0]]: [./././././././.][./././././././.][./././././././.][B/././././././.]
which works as expected.
The problem is if I run this command a second time (doing a run in different folder), I get exactly the same bindings again. Meaning that out of 32 CPU cores, 8 will have the load twice, while the rest 24 will do nothing.
I am aware of different options of mpirun
to bind by core, socket, etc. I could, for instance, specify explicitly the cores that should be used with --cpu-set
argument, or more generally there is the ranking policy,
--rank-by Ranking Policy [slot (default) | hwthread | core | socket | numa | board | node]
What I'm looking for, instead, is a way to automatically distribute the load on the CPU cores that are free, and not reuse the same cores twice. Is there some policy that controls this?
mpirun
directly, you might try writing a shell script that detects cores not running your program and callsmpirun
accordingly for you. – suszterpattmpirun
has no way of knowing which cores are used by another. That's why things like distributed resource managers (DRMs, also called batch queueing systems) exist. When properly configured, DRMs that understand node topologies can usually provide the necessary information to the MPI library and thus prevent two MPI jobs from binding to the same set of cores. – Hristo Iliev--bind-to none
and letting the linux kernel to distribute the load among different CPU cores as it would for normal processes. Is it considered bad practice in HPC? – rthmpirun
that keeps track of which cores are used (e.g. in a shared text file) and that generates the proper binding options (or a rankfile). – Hristo Iliev