Hi currently I'm working on a program that I have working in parallel using MPI. I was wondering if I could gain additional speed in the for loops using OpenMP so I could get more out of each processor. Would I gain anything out of doing this? Also how would I go about it?
1 Answers
From experience it really depend on your problem and on how many MPI processes you are using.
Using large amount of MPI processes usually improve data locality, but your parallelization might not allow large amount of processes.
The thought that you will gain for sure a decent speedup is very often wrong :-(... But then if you reach the point where you cant use more MPI processes due to lack of parallel efficiency you will probably gain the possibility of using more cores efficiently.
From experience you should target a small number of thread (4-8, 1/2 of the socket cores count), especially if you have only small loops (which should be the case if you reach the max number of MPI processes).
A good intro of hybrid parallelism: http://www.openmp.org/press-release/sc13-tutorial-hybrid-mpi-openmp-parallel-programming/