I'd like to copy a (nxn) matrix, which is distributed over a (pxq) grid of processes to all processes, so that each process has the whole (nxn) matrix, similar to an allgather operation from mpi.
I understand that scalapacks pdgemr2d routine is the way to go, but examples and documentation did not help me to figure it out. My idea was to introduce a second blacs context, which consists of only one process, which is also mpi_root. pdgemr2d copies all information to this 1x1 grid. mpi_root then bcasts to all other processes.
I am using the fortran interface of scalapack/blacs.
Here come a bunch of questions:
- Is my idea stated above sane or is there a (canonical) way with better performance?
- There are a lot of contexts in this context and I do not fully understand, if I separate them correctly: All of my pxq processes are in the MPI_WORLD_COMMUNICATOR, this communicator is also used as the blacs context for the grid. Root is then part of MPI_WORLD, the grid-context and the 1x1-context. So it has a chunk of data which also should be send somehow from the pxq-context to the 1x1-context. Is this correct and does this even work?
- The last argument of pdegemr2d is ictxt, which shall be the context-unification of all participating processes, is this MPI_WORLD?
- Do I need different calls for the members of the pxq-grid and the one member of the 1x1-grid? And if so, what shall be the difference?