I am wondering what is the difference between starting a Pool of workers to manage a task in parallel or to start individual processes when it comes to pickling and distributing jobs.
I do have a task (here do_my_job
) whose objects cannot be pickled. Thus, I cannot start a pool of workers to execute the task in parallel. The following snippet does NOT work, where iterator
iterates over different parameters settings for do_my_job
:
import multiprocessing as multip
mpool = multip.Pool(ncores)
mpool.map(do_my_job, iterator)
mpool.close()
mpool.join()
Yet, the following code snippet DOES work:
import time
import multiprocessing as multip
keep_running=True
process_list = []
while len(process_list)>0 or keep_running:
terminated_procs = []
for idx, proc in enumerate(process_list):
if not proc.is_alive():
terminated_procs.append(idx)
for terminated_proc in terminated_procs:
process_list.pop(terminated_proc)
if len(process_list) < ncores and keep_running:
try:
task = iterator.next()
proc = multip.Process(target=do_my_job,
args=(task,))
proc.start()
process_list.append(proc)
except StopIteration:
keep_running=False
time.sleep(0.1)
How is my job in the latter case distributed to the individual processes? Is there not step of pickling the task and all related objects involved before a process is started? If not how are the task and objects passed to the new processes?