I know that Nonblocking communication is not blocking the code, such as MPI_Isend will send the data immediately. But when we have some all-to-all communication and we need to read this data we need to use MPI_Wait to get the data first.
for all-to-all communication the first thing that is coming in my mind is something like: (it is a real code)
1- initialise the MPI, getting the rank and size...
2- creating the data or read the first data from file
3- in a for loop, we need to send the data from all ranks to the other, by MPI_Isend or Evan MPI_Bcast
4- finalise the MPI
for writing the for loop, we need to use MPI_Wait for both sending and receiving. My question is how we can use overlap in nonblocking.
I want to use two times MPI_Isend and MPI_Irecv in each iteration loop to overlap some the computation from the first receiving data and meanwhile do another send and receive, but this approach needs 4 waiting, Is there any algorithm for overlapping the nonblocking communications?