How to overlap nonblocking communication

Question

I know that Nonblocking communication is not blocking the code, such as MPI_Isend will send the data immediately. But when we have some all-to-all communication and we need to read this data we need to use MPI_Wait to get the data first.

for all-to-all communication the first thing that is coming in my mind is something like: (it is a real code)

1- initialise the MPI, getting the rank and size... 
2- creating the data or read the first data from file
3- in a for loop, we need to send the data from all ranks to the other, by MPI_Isend or Evan MPI_Bcast
4- finalise the MPI

for writing the for loop, we need to use MPI_Wait for both sending and receiving. My question is how we can use overlap in nonblocking.

I want to use two times MPI_Isend and MPI_Irecv in each iteration loop to overlap some the computation from the first receiving data and meanwhile do another send and receive, but this approach needs 4 waiting, Is there any algorithm for overlapping the nonblocking communications?

dreamcrash dreamcrash · Accepted Answer · 2021-03-11T22:02:24

I know that None-blocking communication is not blocking the code, such as MPI_Isend will send the data immediately.

This is not accurate the data is not send immediately. Let me explain in more detail:

MPI_Irecv and MPI_Isend are nonblocking communication routines, therefore one needs to use the MPI_Wait (or use MPI_Test to test for the completion of the request) to ensure that the message is completed, and that the data in the send/receive buffer can be again safely manipulated.

The nonblocking here means one does not wait for the data to be read and sent; rather the data is immediately made available to be read and sent. That does not imply, however, that the data is immediately sent. If it were, there would not be a need to call MPI_Wait.

But when we have some all-to-all communication and we need to read this data we need to use MPI_Wait to get the data first.

You need to call MPI_Wait (or MPI_Test) in call communication types 1 to 1, 1 to n, and so on not just in all-to-all. Calling MPI_Wait is not about getting the data first, but rather that during the MPI_Isend the content of the buffer (e.g., the array of ints) has to be read and sent; In the meantime, one can overlap some computation with the ongoing process, however this computation cannot change (or read) the contend of the send/recv buffer. Then one calls the MPI_Wait to ensure that from that point onwards the data send/recv can be safely read/modified without any issues.

I want to use two times MPI_Isend and MPI_Irecv in each iteration loop to overlap some the computation from the first receiving data and meanwhile do another send and receive, but this approach needs 4 waiting, Is there any algorithm for overlapping the nonblocking communications?

Whenever possible just use in build communication routines they are likely more optimized than one would able to optimized. What you have described fits perfectly the MPI routine MPI_Ialltoall:

Sends data from all to all processes in a nonblocking way

How to overlap nonblocking communication

1 Answers