I'm implementing a standard MPI master/slave system: there is a master that distributes work, and there are slaves who ask for chunks and process data.
However... if implemented in a naive way (rank==0 is master, the rest are slaves), the master ends up doing no real work, but still takes one core for what needs practically no real computing power. So I tried to implement a separate "scheduler" thread in the master, but that involved sending MPI messages to itself, and didn't really work...
Do you have any ideas how to solve this?