So, what you say is, that you have some array A which exists on P1. After this is computed in the first step, both P0 and P1 need to work independently with the array that was computed in the first step.
In fact this is far away from to be enough information.
- If the computation of A in the first step is small (smaller than the amount of communication), or if P0 is not busy while P1 calculates A, you can compute A ond both processors independently.
- The computation in the second step needs to be large enough such that is worth to send the data to another processor.
- If both computations on P0 and P1 have about the same size, you can just use MPI_Send (blocking), because I expect another synchronization point later.
- I would prefer manually copying the array (only the part that is really needed) and using asynchronous MPI calls (Isend / Irecv) against MPI_Bsend, which is discouraged.
- You should only use Bsend, Ssend and other variants if there is good reason to. It highly depends on the implementation and the network which of the calls is best. You should only decide whether you can use async MPI or not. Only think about other MPI calls if it is really absolutely necessary.
But what would be the point to use MPI_Sendrecv for this problem? This is useful when p0 sneds something to p1 and at same point p1 sends something to p0. But this is not what you describe, so this is for sure the wrong function, and this is why I recommend you to start with MPI_Send.