I am trying to run a code using hybrid MPI-OpenMP parallelization. According to my knowledge as long as the number of OpenMP threads is less than the number of physical processors, each processor is running one thread. Assuming this is true, suppose I have a hypothetical computing node consisting of two computing cards. Each computing card has chips with 4 processors + memory. My question is: What would be the optimal choice of MPI and OpenMP parameters. I would say 2 MPI jobs and 4 threads each, is this correct?
OMP_NUM_THREADS = 4
mpirun -np 2 code
I heard from some colleagues that those parameters should be carefully chosen, to get the best performance (depending on the hardware layout). I would appreciate some advice on running hybrid jobs.
Thanks