0
votes

I am currently working on a Python code and to gain some speed I used f2py to port some existing Fortran code. Everything works well and the speedup is amazing. However, I found that the code seems to run on multiple threads now (according to htop), which is something I did not specify anywhere (maybe this is done intrinsically by f2py?).

Here's the command I use to create the module:

f2py --f90exec="gfortran" --f90flags="" --noopt \
$(ACMLLIB) $(FFTLIB) $(ACMLINC) $(FFTINC) -c -m fmod myCode.f90

where the variables $(ACMLLIB) $(FFTLIB) $(ACMLINC) and $(FFTINC) are paths to the libraries.

It looks like when I run the script, that it takes all the cores it can find. I don't have a problem that it does that, but I want to at least be able to control it - how can I do this by, e.g. setting the number of threads?

I suspect, this has something to do with the -pthread option here:

....

compiling C sources

C compiler: x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC

....

This is a piece of the massive output after I compile the Fortran module. I have no idea how to handle this.

2
What does produce the threads? The Python part? The Fortran part? What exactly is run in parallel? It should not happen on it's own, pleaso shownsome code.Vladimir F
I don't know what exactly produces the parallelization and/or what is run in parallel. When I open htop in the terminal and then run the code, it shows that all 4 threads of my machine are occupied. How can I check which part is run parallel?rammelmueller
By investigating. Sorry, can't say more without seeing anything.Vladimir F
All the code won't fit here. Which part would you need to see? What I do is, I take the existing pure Python version (which only shows activity on one thread) and replace one function with a Fortran version. I get great speedup and the results are exactly the same, but I see activity on 4 threads. No openMP was used (and no flags either).rammelmueller
From what I see, it must be the fortran part.rammelmueller

2 Answers

2
votes

ACML, the (now end-of-lifed) math library by AMD can use multiple core, see http://developer.amd.com/tools-and-sdks/archive/compute/amd-core-math-library-acml/acml-product-features/

This is most probably why you see. There is a copy of the docs here: https://engineering.ucsb.edu/~stefan/acml.pdf where the use of the environment variable OMP_NUM_THREADS is mentioned to control the number of cores/threads to use. That is the standard OpenMP environment variable.

1
votes

It would be nice to be able to set the number of f2py threads via an environment variable or something. I searched around a bit, but could not find any info about doing that.

Howver, if you're running on linux, say, you can use taskset command-line utility, which provides a way to pin your process (any process) to a particular cpu core or set of cpu cores. This is a bit crude, but I think it will accomplish what you need.

For more info, look here, for instance: http://xmodulo.com/run-program-process-specific-cpu-cores-linux.html