0
votes

I am pretty new to python and I'm unsure of what is the best way to implement a multithread/multiprocess code on a distributed cluster.

I am trying to write a wrapper script using Python that calls an external MPI programme running on a large cluster using a PBS queuing system. A (very) simplified version of type of script I've been working on is given below, where the code moves into a specific directory, runs an external MPI programme and checks the results to see if there have been any large changes.

#!/local/python-2.7.1/bin/python2.7

import os
import subprocess as sp
import coordinate_functions as coord_funcs

os.chdir('/usr/work/cmurray/SeachTest/')
print os.getcwd()

# Gets nodefile and num procs (NP)
cat_np = sp.Popen('cat $PBS_NODEFILE | wc -l', shell = True, stdout=sp.PIPE)
NP = int(cat_np.communicate()[0])
sp.call('cat $PBS_NODEFILE > nodefile', shell = True)

def run_mpi(np, nodefile):
        mpi_cmd = 'mpirun -machinefile %s -np %d mpipg > calc.out' % (nodefile, np)
        sp.call(vasp_cmd, shell = True)


def search_loop(calc_dir, t_total, nodefile, num_procs):

    os.chdir(calc_dir)
    no_events = True
    while no_events or t < t_total:
        run_mpi(mynodefile, NP)
        num_events = coord_funcs.change_test('OUTFILE', 'INFILE', 0.01)
        if num_events > 0:
            event = True
        else:
            t += 1

search_loop('/usr/work/cmurray/SeachTest/calc_1/', 10, mynodefile, NP)

This is then submitted to the queue using:

qsub -l nodes=4 -N SeachTest ./SearchTest

What I want to do is run multiple versions of the search_loop function in parallel in different directories (containing different starting positions for example) read from a list. The processes is very IO heavy with the MPI calculations taking maybe a few minutes to run each time they are called.

Would the threading module be ok for this purpose or is the multiprocessing module a better choice? I will probably need to pass simple messages like the event boolean in the above example between threads/processes.

Also, how do I make sure that the python script is not using processors that I've assigned to the MPI runs?

1
Why don't you use the MPI program to run the program on the different nodes? - dbeer
Unfortunately the MPI program is a proprietary piece of software that has been carefully optimised and compiled by our cluster gurus. I have just today began to use mpi4py instead of multiprocessing. Bit of a paradigm shift (I've seen MPI described as "schizophrenic programming") but should make the code more scalable and have better control of the node/processor assignment. - CiaranAM

1 Answers

0
votes

What I want to do is run multiple versions of the search_loop function in parallel in different directories (containing different starting positions for example) read from a list. The processes is very IO heavy with the MPI calculations taking maybe a few minutes to run each time they are called.

Would the threading module be ok for this purpose or is the multiprocessing module a better choice? I will probably need to pass simple messages like the event boolean in the above example between threads/processes.

I'd try multithreading first for an I/O-intensive program, assuming that there's enough bandwidth to actually parallelize the I/O.

Also, how do I make sure that the python script is not using processors that I've assigned to the MPI runs?

If you don't use multiprocessing, the script will only use a single CPU due to the Global Interpreter Lock.