I have a large-scale code that runs on many CPU cores, potentially across a number of compute nodes. The code is in C++ and is parallelized with OpenMPI.
My code has a very large object (~10GB RAM usage) that is read from by each MPI process. This object is updated very occasionally (and can be done by a single process, just reading in a data file).
What I've been doing so far is giving each MPI process a copy of this object; but that means I'm severely RAM-limited and can't use the full CPU power of my nodes. So, I've been reading about shared memory in the MPI 3 specification.
My question is: what is the best way to share a complex object across MPI processes? In all the examples I find, MPI shared memory windows are created and used to exchange simple data structures (floats, arrays of ints, etc.). My global object is a custom class type that includes a number of member variables, some of which are pointers, and many of which are other complex class types. Hence, I feel like I won't be able to just call MPI_Win_allocate_shared
and pass in the address of my complex object, especially since I want to share all the info about the member variables (in particular, I want to share the underlying values of the pointer type member variables - i.e. sharing a "deep copy" across MPI processes, with all virtual memory addresses correct in each process).
Is it possible to achieve this "deep sharing" with MPI shared memory, and if so, is there a "best practice" for doing so? Or would another library (e.g. boost interprocess) make this more feasible/straightforward for me?
P.S. If I can't figure out a good solution, I will resort to a hybrid MPI+pthreads approach, where I know I can easily have this global object on each node with pthreads. But I'm really hoping to find an elegant MPI-only solution.
mmap()
with a preferred address, but is neither portable nor guaranteed to work each an every time. The proper solution is to use relative pointers and add the base address to the value of each pointer before dereferencing it. – Hristo IlievMPI_Win_allocate_shared
cannot be used when the process group of the underlying communicator spans more than one shared-memory node. The usual MPI RMA should be used instead in such cases. – Hristo Iliev