f2py - automatic multithreading?

Question

I am currently working on a Python code and to gain some speed I used f2py to port some existing Fortran code. Everything works well and the speedup is amazing. However, I found that the code seems to run on multiple threads now (according to htop), which is something I did not specify anywhere (maybe this is done intrinsically by f2py?).

Here's the command I use to create the module:

f2py --f90exec="gfortran" --f90flags="" --noopt \
$(ACMLLIB) $(FFTLIB) $(ACMLINC) $(FFTINC) -c -m fmod myCode.f90

where the variables $(ACMLLIB) $(FFTLIB) $(ACMLINC) and $(FFTINC) are paths to the libraries.

It looks like when I run the script, that it takes all the cores it can find. I don't have a problem that it does that, but I want to at least be able to control it - how can I do this by, e.g. setting the number of threads?

I suspect, this has something to do with the -pthread option here:

....

compiling C sources

C compiler: x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC

....

This is a piece of the massive output after I compile the Fortran module. I have no idea how to handle this.

What does produce the threads? The Python part? The Fortran part? What exactly is run in parallel? It should not happen on it's own, pleaso shownsome code. — Vladimir F
I don't know what exactly produces the parallelization and/or what is run in parallel. When I open htop in the terminal and then run the code, it shows that all 4 threads of my machine are occupied. How can I check which part is run parallel? — rammelmueller
By investigating. Sorry, can't say more without seeing anything. — Vladimir F
All the code won't fit here. Which part would you need to see? What I do is, I take the existing pure Python version (which only shows activity on one thread) and replace one function with a Fortran version. I get great speedup and the results are exactly the same, but I see activity on 4 threads. No openMP was used (and no flags either). — rammelmueller

Pierre de Buyl Pierre de Buyl · Accepted Answer · 2017-03-22T08:40:30

ACML, the (now end-of-lifed) math library by AMD can use multiple core, see http://developer.amd.com/tools-and-sdks/archive/compute/amd-core-math-library-acml/acml-product-features/

This is most probably why you see. There is a copy of the docs here: https://engineering.ucsb.edu/~stefan/acml.pdf where the use of the environment variable OMP_NUM_THREADS is mentioned to control the number of cores/threads to use. That is the standard OpenMP environment variable.

f2py - automatic multithreading?

2 Answers