1
votes

I need to implement some parallel computing functionality for some computationally demanding c++ code. I have read that a combination of MPI and OpenMP can be used to achieve what I need - MPI can be used to distribute tasks between processors and OpenMP is used to distribute tasks between threads on individual processors.

I typed lscpu (see below) to check the processor details of my office PC but I am not sure how to interpret it. The key points appear to be the following:

  • 12 CPU(s)
  • 1 Socket
  • 6 Core(s) per socket
  • 2 Thread(s) per core

So how do I interpret this in terms of possibilities for parallelization? Specifically, how do MPI and OpenMP correspond to the items in this list? Is MPI used to distribute across the 12 CPUs, and then OpenMP across the 2 threads? But then what about cores and sockets?

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              12
On-line CPU(s) list: 0-11
Thread(s) per core:  2
Core(s) per socket:  6
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               158
Model name:          Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
Stepping:            10
CPU MHz:             4409.872
CPU max MHz:         4700,0000
CPU min MHz:         800,0000
BogoMIPS:            7392.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            12288K
NUMA node0 CPU(s):   0-11
2
Why do you want to use a combination of open-MP and MPI? It is not the easiest thing to do and your performances will be slower than if you just use openmp. - Alain Merigot
6 cores with 2 threads per core? So you have hyperthreading turned on? OpenMP would probably use up to 12 threads for parallel loops and such unless told otherwise. You'd use MPI to divide work up among different processes, not threads of a single process. - Shawn
The sweet spot for most apps is 1 MPI task per socket (e.g. NUMA domain) and 1 OpenMP thread per core, and have both MPI and OpenMP runtime bind everything. Hyperthreading generally does not help(when it does not hurt). Of course, every app is different and you should try several combinations to find the optimal one on a per app basis. - Gilles Gouaillardet
@AlainMerigot For the moment I am just working on my office pc but ultimately this code will have to be run on a cluster and parallelized as much as possible. - electroscience

2 Answers

3
votes

MPI is used for clusters of multiple computers (shared memory nodes). Typically, you run one MPI rank (process) for each shared memory node and OpenMP within a shared memory node. If you target a single office computer, MPI is not the first choice for a programming model. Most likely, you should use OpenMP exclusively.

Now there are some valid reasons to run more than one MPI process per node, i.e. for reasons of NUMA, or because you don't benefit from shared memory.

In general, if you are a beginner focus on one parallel paradigm first and get familiar with it.

3
votes

"how do MPI and OpenMP correspond to the items in this list" - I'd say that for MPI this list is irrelevant, while OpenMP will be capable of 12x parallelization at most. But the thing is that OpenMP does not magically give your code a speed boost by running it in parallel. Existing applications may require complete overhaul to take advantage of multiple threads. So the proper starting point would be to figure out which one of the performance bottlenecks is the easiest one to make parallel and rework them one by one. OpenMP may or may not be of any help.