Memory usage keep growing with Python's multiprocessing.pool

Question

Here's the program:

#!/usr/bin/python

import multiprocessing

def dummy_func(r):
    pass

def worker():
    pass

if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=16)
    for index in range(0,100000):
        pool.apply_async(worker, callback=dummy_func)

    # clean up
    pool.close()
    pool.join()

I found memory usage (both VIRT and RES) kept growing up till close()/join(), is there any solution to get rid of this? I tried maxtasksperchild with 2.7 but it didn't help either.

I have a more complicated program that calles apply_async() ~6M times, and at ~1.5M point I've already got 6G+ RES, to avoid all other factors, I simplified the program to above version.

EDIT:

Turned out this version works better, thanks for everyone's input:

#!/usr/bin/python

import multiprocessing

ready_list = []
def dummy_func(index):
    global ready_list
    ready_list.append(index)

def worker(index):
    return index

if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=16)
    result = {}
    for index in range(0,1000000):
        result[index] = (pool.apply_async(worker, (index,), callback=dummy_func))
        for ready in ready_list:
            result[ready].wait()
            del result[ready]
        ready_list = []

    # clean up
    pool.close()
    pool.join()

I didn't put any lock there as I believe main process is single threaded (callback is more or less like a event-driven thing per docs I read).

I changed v1's index range to 1,000,000, same as v2 and did some tests - it's weird to me v2 is even ~10% faster than v1 (33s vs 37s), maybe v1 was doing too many internal list maintenance jobs. v2 is definitely a winner on memory usage, it never went over 300M (VIRT) and 50M (RES), while v1 used to be 370M/120M, the best was 330M/85M. All numbers were just 3~4 times testing, reference only.

Just speculating here, but queuing a million objects takes up space. Perhaps batching them will help. The docs are not definitive, but the example (search for Testing callback) shows apply_async result being waited on, even when there are callbacks. The wait may be needed to clear a result queue. — tdelaney
So multiprocessing.pool may not be the right tool for me, as callback actually does not do cleanup jobs, is it possible to do cleanup in callback? The problem is that I cannot wait after apply_async() call as in real world worker() takes ~0.1 seconds per request (several HTTP requests). — C.B.
Wild guess: apply_asynch creates an AsynchResult instance. The Pool probably has some reference to these objects, since they must be able to return the result when the computation has finished, but in your loop you are simply throwing them away. Probably you should call get() or wait() on the asynch results at some point, maybe using the callback argument of apply_asynch. — Bakuriu
I think there's a race condition on the EDIT version when you overwrite ready_list. There's a thread which handles the results from the AsyncResults (docs.python.org/2/library/…) and that thread calls the callback. It may be faster simply because you are discarding results. Also, use time.sleep() with a small random delay to simulate work and sprinkle sleeps in your code to catch race conditions. — Javier
maxtasksperchild seems to have fixed the memory leak caused by apply_async on 3.7. — laido yagamii

deddu deddu · Accepted Answer · 2014-01-23T18:01:28

I had memory issues recently, since I was using multiple times the multiprocessing function, so it keep spawning processes, and leaving them in memory.

Here's the solution I'm using now:

def myParallelProcess(ahugearray):
    from multiprocessing import Pool
    from contextlib import closing
    with closing(Pool(15)) as p:
        res = p.imap_unordered(simple_matching, ahugearray, 100)
    return res

Memory usage keep growing with Python's multiprocessing.pool

5 Answers