1
votes

The MPI standard states that once a buffer has been given to a non-blocking communication function, the application is not allowed to use it until the operation has completed (i.e., until after a successful TEST or WAIT function).

Is this also applied to the situation below:

I have a buffer and every part of that will go to different processors, e.g part of it will be copied from data available on the processor itself.

Am I allowed on every processor to MPI_Irecv different parts of the buffer from other processors, copy the part available in the processor then MPI_Isend the data that should go to others, do my other computations, and MPI_Waitall so my send and receive get completed?

n=0;
for (i = 0; i < size; i++) {
    if (i != rank) {
        MPI_Irecv(&recvdata[i*100], 100, MPI_INT, i, i, comm, &requests[n]);
        n++;
    }
}

process(&recvdata[rank*100], 100);

for (i = 0; i < size; i++) {
    if (i != rank) { 
        MPI_Isend(&senddata[i*100], 100, MPI_INT, i, rank, comm, &requests[n]);
        n++;
    }
}

MPI_Waitall(n, requests, statuses);
2
As long as senddata[] and recvdata[] do not overlap, the code is perfectly fine. - Hristo Iliev

2 Answers

4
votes

I'm not 100% sure I understand what you're asking, so I'll restate the question first:

If I have a large array of data, can I create nonblocking calls to receive data from subsets of the array and then send the data back out to other processes?

The answer to that is yes, as long as you synchronize between the receives and sends. Remember that the data from the MPI_IRECV won't have arrived until you've completed the call with MPI_WAIT, so you can't send it to another process until that's happened. Otherwise, the sends will be sending out whatever garbage happens to be in the buffer at the time.

So your code can look like this and be safe:

for (i = 0; i < size; i++)
    MPI_Irecv(&data[i*100], 100, MPI_INT, i, 0, comm, &requests[i]);

/* No touching data in here */

MPI_Waitall(i, requests, statuses);

/* You can touch data here */

for (i = 0; i < size; i++)
    MPI_Isend(&data[i*100], 100, MPI_INT, i+1, 0, comm); /* i+1 is wherever you want to send the data */

/* No touching data in here either */

MPI_Waitall(i, requests, statuses);
3
votes

Throughout the MPI standard the term locations is used and not the term variables in order to prevent such confusion. The MPI library does not care where the memory comes from as long outstanding MPI operations are operating on disjoint sets of memory locations. Different memory locations could be different variables or different elements of a big array. In fact, the whole process memory could be thought as one big anonymous array of bytes.

In many cases, it is possible to achieve the same memory layout given different set of variable declarations. For example, with most x86/x64 C/C++ compilers the following two sets of local variable declarations will result in the same stack layout:

int a, b;             int d[3];
int c;             

|     ....     |      |     ....     |    |
+--------------+      +--------------+    |
|      a       |      |     d[2]     |    |
+--------------+      +--------------+    |  lower addresses
|      b       |      |     d[1]     |    |
+--------------+      +--------------+    |
|      c       |      |     d[0]     |   \|/
+--------------+      +--------------+    V

In that case:

int a, b;
int c;

MPI_Irecv(&a, 1, MPI_INT, ..., &req[0]);
MPI_Irecv(&c, 1, MPI_INT, ..., &req[1]);
MPI_Waitall(2, &req, MPI_STATUSES_IGNORE);

is equivalent to:

int d[3];

MPI_Irecv(&d[2], 1, MPI_INT, ..., &req[0]);
MPI_Irecv(&d[0], 1, MPI_INT, ..., &req[1]);
MPI_Waitall(2, &req, MPI_STATUSES_IGNORE);

In the second case, though d[0] and d[2] belong to the same variable, &d[0] and &d[2] specify different and - in combination with ..., 1, MPI_INT, ... - disjoint memory locations.

In any case, make sure that you are not simultaneously reading from and writing into the same memory location.

A somehow more complex version of the example given by Wesley Bland follows. It overlaps send and receive operations by using MPI_Waitsome instead:

MPI_Request rreqs[size], sreqs[size];

for (i = 0; i < size; i++)
    MPI_Irecv(&data[i*100], 100, MPI_INT, i, 0, comm, &rreqs[i]);

while (1)
{
    int done_idx[size], numdone;

    MPI_Waitsome(size, rreqs, &numdone, done_idx, MPI_STATUSES_IGNORE);
    if (numdone == MPI_UNDEFINED)
        break;

    for (i = 0; i < numdone; i++)
    {
        int id = done_idx[i];
        process(&data[id*100], 100);
        MPI_Isend(&data[id*100], 100, MPI_INT, id, 0, comm, &sreqs[id]);
    }
}

MPI_Waitall(size, sreqs, MPI_STATUSES_IGNORE);

In that particular case, using size separate arrays could result in somehow more readable code.