System configure:
Workstation with two Xeon E5-2620 V4 CPUs. Cent OS 7.3.
Openmpi-3.0.1, ifort 2015, gcc 4.8.6, intel MKL.
I run an MPI/OpenMP hybrid program on a Workstation. I want to use 1 MPI process with 8 OpenMP threads. However, the OpenMP threads used in the parallel region is always 1. In another machine with an Intel 9900K CPU, the number of OpenMP threads is always 2. For both machines, I have printed the OMP_NUM_THREADS
by calling omp_get_max_threads. OMP_NUM_THREADS
is 8 since I already set "export OMP_NUM_THREADS=8
". It is really bothering.
After digging about one day, I realized it was related to the Openmpi parameter "-bind-to
". If "-bind-to none" or "-bind-to numa
" is used, the program works fine since CPU usage for each MPI process is 800% and 8 times speedup is obtained in the parallel region. If I use the default value that is "-bind-to core
". The number of OpenMP threads is always not what I expected. Therefore, the workstation with Xeon 2640, we have disabled hyper-threading so the real used openmp
threads are always 1. For PC with Intel i7-9900K, hyper-threading is enabled so the number of OMP threads used is 2.
Moreover, if I do not use "-bind-to none/numa
" parameters, omp_get_num_procs return 1. If "-bind-to none
" is used, omp_get_num_procs returns number of processors(CPU cores) while using "-bind-to numa
", omp_get_num_procs return number of CPU cores in one CPU.
I post my experience here that may helpful for other people to have a similar problem.