0
votes

Let's say I have started an MPI job with 256 cores on 16 nodes.

I have an MPI program, but it is not unfortunately parallel over one parameter. Fortunately, I can easily create my own MPI program, which could handle parallelization of that parameter, only if I obtain the output files.

So, how can I start an MPI job (from within an MPI job), which uses a particular subset of these cores, namely only a particular node? So basically I want to run 16 distinct MPI calculations all with 16 cores, from within a single 256 core MPI job. These calculations take about 10 minutes with 16 cores, and there are about 200 iterations in the outer loop. With 256 cores, that is a reasonable 32 hours. It is not plausible to either resubmit 200 times, or run these 16 calculations sequentially.

To be even more precise, here is some python-pseudo-code for what I want to do:

from ase.parallel import world, rank
from os import system, chdir
while 1:
    node = rank // 16
    subrank = rank % 16
    chdir(mydir+"Calculation_%d" % node)
    # This will not work, one needs to specify somehow that only ranks from node*16 to node*16+15 will be used
    os.system("mpirun -n 16 nwchem input.nw > nwchem.out") 
    analyse_output(mydir+"Calculation_%d/nwchem.out" % node)
    rewrite_input_files()

Basicly, with 16 cores and 4 jobs:

rank 0: start nwchem process in /calculation0/ as rank 0/4.
rank 1: start nwchem in /calculation0/ as rank 1/4.
rank 2: start nwchem in /calculation0/ as rank 2/4.
rank 3: start nwchem in /calculation0/ as rank 3/4.
rank 4: start nwchem in /calculation1/ as rank 0/4.
rank 5: start nwchem in /calculation1/ as rank 1/4.
rank 6: start nwchem in /calculation1/ as rank 2/4.
rank 7: start nwchem in /calculation1/ as rank 3/4.
rank 8: start nwchem in /calculation2/ as rank 0/4.
rank 9: start nwchem in /calculation2/ as rank 1/4.
rank 10: start nwchem in /calculation2/ as rank 2/4.
rank 11: start nwchem in /calculation2/ as rank 3/4.
rank 12: start nwchem in /calculation3/ as rank 0/4.
rank 13: start nwchem in /calculation3/ as rank 1/4.
rank 14: start nwchem in /calculation3/ as rank 2/4.
rank 15: start nwchem in /calculation3/ as rank 3/4.

Gather all the results.
Optimize all geometries (this requires knowledge of forces between the calculations).
Repeat until convergence (about 200 times).

Background: In case you are interested, I will elaborate details here. But the main question still is "How to instantiate N MPI calculations from a single MPI calculation of M cores, which each have their M / N cores.

NWChem does not have image-parallel nudged elastic band calculator. Here is an example of this process, with a different code: GPAW. https://wiki.fysik.dtu.dk/gpaw/tutorials/neb/neb.html

Here it is smooth, because it is so easy to create a sub-communicator with GPAW and it's MPI interface. However, I only have the nwchem runtime MPI, and I wish to do the same thing: create many calculators (a band or a chain of geometries, which are all linked with 'springs', and optimize that chain.)

1
Please see "Should questions include “tags” in their titles?", where the consensus is "no, they should not"!user57508
I initially already though that this question would be easily understood as you did, so I tried to elaborate with pseudo-code. Short version: The outcome of these calculations are linked and part of a bigger outer optimization loop. To be more precise image-parallel NEB calculation. wiki.fysik.dtu.dk/gpaw/tutorials/neb/neb.htmlMikael Kuisma
I am fighting the wall time, not the resources. So, sorry, but that really is not an option. I have to look deeper into how mpirun spawns its processes and pins them to cores and nodes. And just test...Mikael Kuisma

1 Answers

0
votes

I suppose, you are trying to use Dynamic Process Management in MPI. We can spawn new processes for the smaller jobs and then after the calculations we can connect the spawned existing process.

Detailed Explanation on DPM

Example program using DPM