I program in C++ and uses OpenMP for parallelization. The machine has 2 CPU sockets and 8 cores per each socket.
Since I compile with intel compiler, I set the following environment variables
export KMP_AFFINITY=verbose,scatter
With verbose option, I can see the following messages when running the binary.
[0] OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
[0] OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
[0] OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0}
[0] OMP: Info #156: KMP_AFFINITY: 1 available OS procs
[0] OMP: Info #157: KMP_AFFINITY: Uniform topology
[0] OMP: Info #159: KMP_AFFINITY: 1 packages x 1 cores/pkg x 1 threads/core (1 total cores)
[0] OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
[0] OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0
[0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 0 bound to OS proc set {0}
[0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 14 bound to OS proc set {0}
[0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 15 bound to OS proc set {0}
[0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 11 bound to OS proc set {0}
[0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 6 bound to OS proc set {0}
[0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 7 bound to OS proc set {0}
[0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 8 bound to OS proc set {0}
[0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 9 bound to OS proc set {0}
[0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 10 bound to OS proc set {0}
[0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 13 bound to OS proc set {0}
[0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 12 bound to OS proc set {0}
As you can see, OMP cannot detect the correct number of packages (sockets) and cores per package. As a result, all the threads are pinned to a single core.
How can I resolve this issue? Where should I start?