Introduction
I want to write a hybrid MPI/pthreads code. My goal is to have one MPI process started on each node and have each of those processes split into multiple threads that will actually do the job, but with communication only happening between the separate MPI processes.
There are quite a few tutorials describing this situation, called hybrid programming, but they typically assume a homogeneous cluster. However, the one I am using has heterogeneous nodes: they have different processors and different numbers of cores, i.e. the nodes are a combination of 4/8/12/16 core machines.
I am aware that running an MPI process across this cluster will make my code slow down to the speed of the slowest CPU used; I accept that fact. With that I would like to get to my question.
Is there a way to start N MPI processes -- with one MPI process per node -- and let each know how many physical cores are available to it at that node?
The MPI implementation I have access to is OpenMPI. The nodes are a mix of Intel and AMD CPUs. I thought of using a machinefile with each node specified as having one slot, then figuring out the number of cores locally. However, there seem to be problems with doing that. I am surely not the first person with this problem, but somehow searching the web didn't point me in the right direction yet. Is there a standard way of solving this problem other than finding oneself a homogeneous cluster?