If I understand your question right, you would like to have simultaneous row-wise broadcasts:
P00 -> P01 & P02
P10 -> P11 & P12
P20 -> P21 & P22
This could be done using subcommunicators, e.g. one that only has processes from row 0 in it, another one that only has processes from row 1 in it and so on. Then you can issue simultaneous broadcasts in each subcommunicator by calling MPI_Bcast
with the appropriate communicator argument.
Creating row-wise subcommunicators is extreamly easy if you use Cartesian communicator in first place. MPI provides the MPI_CART_SUB
operation for that. It works like that:
int dims[2] = { 3, 3 };
int periods[2] = { 0, 0 };
MPI_Comm comm_cart;
MPI_Cart_create(MPI_COMM_WORLD, 2, dims, periods, 0, &comm_cart);
int remaindims[2] = { 0, 1 };
MPI_Comm comm_row;
MPI_Cart_sub(comm_cart, remaindims, &comm_row);
Now comm_row
will contain handle to a new subcommunicator that will only span the same row that the calling process is in. It only takes a single call to MPI_Bcast
now to perform three simultaneous row-wise broadcasts:
MPI_Bcast(&data, data_count, MPI_DATATYPE, 0, comm_row);
This works because comm_row
as returned by MPI_Cart_sub
will be different in processes located at different rows. 0
here is the rank of the first process in comm_row
subcommunicator which will correspond to P*0
because of the way the topology was constructed.
If you do not use Cartesian communicator but operate on MPI_COMM_WORLD
instead, you can use MPI_COMM_SPLIT
to split the world communicator into three row-wise subcommunicators. MPI_COMM_SPLIT
takes a color
that is used to group processes into new subcommunicators - processes with the same color
end up in the same subcommunicator. In your case color
should equal to the number of the row that the calling process is in. The splitting operation also takes a key
that is used to order processes in the new subcommunicator. It should equal the number of the column that the calling process is in, e.g.:
int proc_row = rank / 3;
int proc_col = rank % 3;
MPI_Comm comm_row;
MPI_Comm_split(MPI_COMM_WORLD, proc_row, proc_col, &comm_row);
Once again comm_row
will contain the handle of a subcommunicator that only spans the same row as the calling process.
MPI_Bcast()
. Also post any communicator set-up, if that's what you're using for the groupings. – chrisaycock