Running parallel Python threads that calls external MPI programme submitted using PBS

Question

I am pretty new to python and I'm unsure of what is the best way to implement a multithread/multiprocess code on a distributed cluster.

I am trying to write a wrapper script using Python that calls an external MPI programme running on a large cluster using a PBS queuing system. A (very) simplified version of type of script I've been working on is given below, where the code moves into a specific directory, runs an external MPI programme and checks the results to see if there have been any large changes.

#!/local/python-2.7.1/bin/python2.7

import os
import subprocess as sp
import coordinate_functions as coord_funcs

os.chdir('/usr/work/cmurray/SeachTest/')
print os.getcwd()

# Gets nodefile and num procs (NP)
cat_np = sp.Popen('cat $PBS_NODEFILE | wc -l', shell = True, stdout=sp.PIPE)
NP = int(cat_np.communicate()[0])
sp.call('cat $PBS_NODEFILE > nodefile', shell = True)

def run_mpi(np, nodefile):
        mpi_cmd = 'mpirun -machinefile %s -np %d mpipg > calc.out' % (nodefile, np)
        sp.call(vasp_cmd, shell = True)


def search_loop(calc_dir, t_total, nodefile, num_procs):

    os.chdir(calc_dir)
    no_events = True
    while no_events or t < t_total:
        run_mpi(mynodefile, NP)
        num_events = coord_funcs.change_test('OUTFILE', 'INFILE', 0.01)
        if num_events > 0:
            event = True
        else:
            t += 1

search_loop('/usr/work/cmurray/SeachTest/calc_1/', 10, mynodefile, NP)

This is then submitted to the queue using:

qsub -l nodes=4 -N SeachTest ./SearchTest

What I want to do is run multiple versions of the search_loop function in parallel in different directories (containing different starting positions for example) read from a list. The processes is very IO heavy with the MPI calculations taking maybe a few minutes to run each time they are called.

Would the threading module be ok for this purpose or is the multiprocessing module a better choice? I will probably need to pass simple messages like the event boolean in the above example between threads/processes.

Also, how do I make sure that the python script is not using processors that I've assigned to the MPI runs?

Why don't you use the MPI program to run the program on the different nodes? — dbeer
Unfortunately the MPI program is a proprietary piece of software that has been carefully optimised and compiled by our cluster gurus. I have just today began to use mpi4py instead of multiprocessing. Bit of a paradigm shift (I've seen MPI described as "schizophrenic programming") but should make the code more scalable and have better control of the node/processor assignment. — CiaranAM

Fred Foo Fred Foo · Accepted Answer · 2011-11-15T16:04:08

What I want to do is run multiple versions of the search_loop function in parallel in different directories (containing different starting positions for example) read from a list. The processes is very IO heavy with the MPI calculations taking maybe a few minutes to run each time they are called.

Would the threading module be ok for this purpose or is the multiprocessing module a better choice? I will probably need to pass simple messages like the event boolean in the above example between threads/processes.

I'd try multithreading first for an I/O-intensive program, assuming that there's enough bandwidth to actually parallelize the I/O.

Also, how do I make sure that the python script is not using processors that I've assigned to the MPI runs?

If you don't use multiprocessing, the script will only use a single CPU due to the Global Interpreter Lock.

Running parallel Python threads that calls external MPI programme submitted using PBS

1 Answers