To optimize MPI communication it is important to understand the flow of the whole communication process. This is rather straightforward for synchronous communication, but what about asynchronous communication? As I understand it, it works in one of these two ways:
- Rank0 -> Isend -> Rank1 and Rank1 -> Isend -> Rank0
- Rank0 -> Irecv -> Rank1 and Rank1 -> Irecv -> Rank0
- Rank0 and Rank1 do some computation
- Messages are being dispatched to their respective target location
- Matching Recv call found! -> write into the given recv-buffer
- Rank0 and Rank1 finish their computation and call MPI_Wait for send and receive
- MPI_Wait -> communication completed
or
- Rank0 -> Isend -> Rank1 and Rank1 -> Isend -> Rank0
- Rank0 and Rank1 do some computation
- Messages are being dispatched to their respective target location
- No matching Recv call found! -> allocate own temporary buffer and write into that
- Rank0 and Rank1 finish their computation and call MPI_Recv
- Matching MPI_Recv call is found -> temporary buffer is written into the recv-buffer
- Rank0 and Rank1 call MPI_Wait
- MPI_Wait -> Communication is completed -> the temporary buffer is freed
Is this correct? Do I need to be aware of any other processes that run in the background of MPI to optimize its usage?