67
votes

I am trying to implement multiprocessing in my code, and so, I thought that I would start my learning with some examples. I used the first example found in this documentation.

from multiprocessing import Pool
def f(x):
    return x*x

if __name__ == '__main__':
    with Pool(5) as p:
        print(p.map(f, [1, 2, 3]))

When I run the above code I get an AttributeError: can't get attribute 'f' on <module '__main__' (built-in)>. I do not know why I am getting this error. I am also using Python 3.5 if that helps.

3
works perfectly for me (also on python 3.5). - hiro protagonist
Yep, also works for me on Python 3.5 - user554546
I should add that I am also using Jupyter Notebook and anaconda as my interpreter. But anaconda is using python 3.5. - PiccolMan
I get the error on Windows 10 with Anaconda using Python 3.6. - devinbost
I get the error on a regular python shell windows 10 Python 3.6 - rockikz

3 Answers

85
votes

This problem seems to be a design feature of multiprocessing.Pool. See https://bugs.python.org/issue25053. For some reason Pool does not always work with objects not defined in an imported module. So you have to write your function into a different file and import the module.

File: defs.py

def f(x):
    return x*x

File: run.py

from multiprocessing import Pool
import defs

 if __name__ == '__main__':
    with Pool(5) as p:
        print(p.map(defs.f, [1, 2, 3]))

If you use print or a different built-in function, the example should work. If this is not a bug (according to the link), the given example is chosen badly.

17
votes

The multiprocessing module has a major limitation when it comes to IPython use:

Functionality within this package requires that the __main__ module be importable by the children. [...] This means that some examples, such as the multiprocessing.pool.Pool examples will not work in the interactive interpreter. [from the documentation]

Fortunately, there is a fork of the multiprocessing module called multiprocess which uses dill instead of pickle to serialization and overcomes this issue conveniently.

Just install multiprocess and replace multiprocessing with multiprocess in your imports:

import multiprocess as mp

def f(x):
    return x*x

with mp.Pool(5) as pool:
    print(pool.map(f, [1, 2, 3, 4, 5]))

Of course, externalizing the code as suggested in this answer works as well, but I find it very inconvenient: That is not why (and how) I use IPython environments.

<tl;dr> multiprocessing does not work in IPython environments right away, use its fork multiprocess instead.

0
votes

If you're using Jupyter notebook (like the OP), then defining the function in a separate cell and executing that cell first fixes the problem. The accepted answer works too, but it's more work. Defining the function before, i.e. above the pool, isn't adequate. It has to be in a completely different notebook cell which is executed first.