4
votes

I'm trying to implement multiprocessing to speed up a replication loop, but cannot get it to work in Python27. This is a very simplified version of my program, based on the docs and other answers here at SO (e.g. Python multiprocessing pool.map for multiple arguments). I realize that there are a number of quesions on multiprocessing, but so far I haven't been able to solve this issue. Hopefully I haven't overlooked anything too trivial.

Code

import itertools
from multiprocessing import Pool

def func(g, h, i):
    return g + h + i

def helper(args):
    args2 = args[0] + (args[1],)
    return func(*args2)

pool = Pool(processes=4)
result = pool.map(helper, itertools.izip(itertools.repeat((2, 3)), range(20)))
print result

This works when using map(...), but not when using pool.map(...).

Error message:

Process PoolWorker-3:
Traceback (most recent call last):
File "C:\Program_\EPD_python27\lib\multiprocessing\process.py", line 258, in _
bootstrap
self.run()
File "C:\Program_\EPD_python27\lib\multiprocessing\process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "C:\Program_\EPD_python27\lib\multiprocessing\pool.py", line 85, in worker
task = get()
File "C:\Program_\EPD_python27\lib\multiprocessing\queues.py", line 376, in get
return recv()
AttributeError: 'module' object has no attribute 'helper'
3

3 Answers

3
votes

The problem is solved by adding a main() function as:

import itertools
from multiprocessing import Pool

def func(g, h, i):
    return g + h + i

def helper(args):
    args2 = args[0] + (args[1],)
    return func(*args2)

def main():
    pool = Pool(processes=4)
    result = pool.map(helper,itertools.izip(itertools.repeat((2, 3)), range(10)))
    print result

if __name__ == '__main__':
    main()

Based on the answer from @ErikAllik I'm thinking that this might be a Windows-specific problem.

edit: Here is a clear and informative tutorial on multiprocessing in python.

2
votes

There's a fork of multiprocessing called pathos (note: use the version on github) that doesn't need starmap or helpers or all of that other stuff -- the map functions mirror the API for python's map, thus map can take multiple arguments. With pathos, you can also generally do multiprocessing in the interpreter, instead of being stuck in the __main__ block. pathos is due for a release, after some mild updating -- mostly conversion to python 3.x.

  Python 2.7.5 (default, Sep 30 2013, 20:15:49) 
  [GCC 4.2.1 (Apple Inc. build 5566)] on darwin
  Type "help", "copyright", "credits" or "license" for more information.
  >>> from pathos.multiprocessing import ProcessingPool    
  >>> pool = ProcessingPool(nodes=4)
  >>>
  >>> def func(g,h,i):
  ...   return g+h+i
  ... 
  >>> p.map(func, [1,2,3],[4,5,6],[7,8,9])
  [12, 15, 18]
  >>>
  >>> # also can pickle stuff like lambdas 
  >>> result = pool.map(lambda x: x**2, range(10))
  >>> result
  [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
  >>>
  >>> # also does asynchronous map
  >>> result = pool.amap(pow, [1,2,3], [4,5,6])
  >>> result.get()
  [1, 32, 729]
  >>>
  >>> # or can return a map iterator
  >>> result = pool.imap(pow, [1,2,3], [4,5,6])
  >>> result
  <processing.pool.IMapIterator object at 0x110c2ffd0>
  >>> list(result)
  [1, 32, 729]
1
votes

On my OS X, with Python 2.7, your code outputs:

[5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]

I can see your Python paths contain EPD_python27, so maybe try using a vanila Python distribution, not Enthought Python Distribution.

UPDATE: Please see @fileunderwater's answer for a solution; I've run into this once myself, but had totally forgotten about it :)

Explanation: The problem happens (only on Windows for some reason, but could as well be happening on OS X and Linux) because your module contains top-level code. What multiprocessing does is that it imports your code in the subprocess and executes it. However, if your module contains top-level code, it will be evaluated/executed immediately as the module gets imported. Wrapping it in main and only calling main() conditionally (i.e. with a if __name__ == '__main__' block), you're preventing this from happening. Also, this is more correct on OS X and Linux, and is generally always preferred over putting code right in the module.