11
votes

I use the python 'multiprocessing' module to run single processes on multiple cores but I want to run a couple of independent processes in parallel.

For example, Process-one parses large files, Process-two find patterns in different files and process three does some calculation; can all these three different processed that have different sets of arguments be run in parallel?

def Process1(largefile):
    Parse large file
    runtime 2hrs
    return parsed_file

def Process2(bigfile)
    Find pattern in big file
    runtime 2.5 hrs
    return pattern

def Process3(integer)
    Do astronomical calculation
    Run time 2.25 hrs
    return calculation_results

def FinalProcess(parsed,pattern,calc_results):
    Do analysis
    Runtime 10 min
    return final_results

def main():
parsed = Process1(largefile)
pattern = Process2(bigfile)
calc_res = Process3(integer)
Final = FinalProcess(parsed,pattern,calc_res)

if __name__ == __main__:
    main()
    sys.exit()

In the above pseudo-code Process1, Process2 and Process3 are single-core processes i.e they can't be run on multiple processors. These processes are run sequentially and take 2+2.5+2.25hrs = 6.75 hrs. Is it possible to run these three processes in parallel? So that they run at the same time on different processors/cores and when most time taking (Process2) finishes than we move to Final Process.

1
Have you looked at the subprocess module?Tim Peters
You probably want to create either a thread or a child process in which to do your data processing.Emmett Butler
@Emmett, he can't use threading unless he's using Jython, IronPython or something else that skips the GILplanestepper
Ah, that's a great point @leon. Running one thread per core will still take the same amount of time as running all of these processes serially due to the GIL.Emmett Butler
@Emmet, yes... and that's a strong weaknessplanestepper

1 Answers

25
votes

From 16.6.1.5. Using a pool of workers:

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    pool = Pool(processes=4)              # start 4 worker processes
    result = pool.apply_async(f, [10])    # evaluate "f(10)" asynchronously
    print result.get(timeout=1)           # prints "100" unless your computer is *very* slow
    print pool.map(f, range(10))          # prints "[0, 1, 4,..., 81]"

You can, therefore, apply_async against a pool and get your results after everything is ready.

from multiprocessing import Pool

# all your methods declarations above go here
# (...)

def main():
    pool = Pool(processes=3)
    parsed = pool.apply_async(Process1, [largefile])
    pattern = pool.apply_async(Process2, [bigfile])
    calc_res = pool.apply_async(Process3, [integer])

    pool.close()
    pool.join()

    final = FinalProcess(parsed.get(), pattern.get(), calc_res.get())

# your __main__ handler goes here
# (...)