3
votes

I'm trying to implement an All-To-All (i.e. MPI_Allgather) operation on a hyper-cube network using C++.

For example, for n( i.e. number of processors) = 8, I store the initial data as

p0: [00, 01, 02, ..., 07]; 
p1: [10, 11, 12, ..., 17],
...
... 
p7: [70, 71, 72, ..., 77]. 

Eventually after running All-To-All, data should become

p0: [00, 10, 20, ..., 70], 
P1: [01, 11, 21, ..., 71],
..., 
p7: [07, 17, 27, ..., 77]. 

(In other words, every processor grabs data from everyone else).

I though of the algorithm using some mask and loop that involves the step of swapping data between two processors, e.g., swap last 4 elements of p0 with first 4 elements of p3 (sending last 4 elements of p0 to p3 and sending first 4 elements of p3 to p0 at the same time). using MPI_Send and MPI_Recv cannot achieve this because the receivers' half array will be overwritten before it sends out its data. Could anyone help me with what techniques I could use to do this? I thought about using a intermediate buffer, but still not exactly sure how to write the send and receive MPI code.

Or if someone can tell me any other way to implement All-to-All. I would really appreciate.Thank you very much!

1

1 Answers

1
votes

All-to-all in MPI is performed by MPI_ALLTOALL or MPI_ALLTOALLV. The regular calls require two distinct buffers for sending and receiving data. The MPI standard also defines an "in place" option for both operations. In your case this code should do it:

double p[8];

MPI_Alltoall(MPI_IN_PLACE, 1, MPI_DOBLE,  // send count and datatype are ignored
             p, 1, MPI_DOUBLE,
             MPI_COMM_WORLD);

Unfortunately some MPI implementations do not support this "in place" mode. One notable example is Open MPI. MPICH2 supports it.

Here is one way to implement it: MPICH2 alltoall.c