Unknown error with exchanging halo/ghost cells using MPI (C)

Question

I'm new to MPI so go easy on me ... anyway, I'm trying to use MPI_Isend and MPI_Irecv for non-blocking communication. I wrote a subroutine called "halo_exchange" which I'd like to call each time I need to exchange halo cells between neighboring sub-domains. I'm able to split the domain up properly and I know each of my neighbor ranks. In the code below, the neighbors are oriented North/South (i.e. I use a 1D row decomposition). All processes are used in the computation. In other words, all processes will call this subroutine and need to exchange data.

Originally I was using a set of MPI_Isend/MPI_Irecv calls for both the North and South boundaries, but then I split it up thinking maybe there was something wrong with passing "MPI_PROC_NULL" to the functions (boundaries are not periodic). That is the reason for the if statements. The code continues to get hung up on the "MPI_Waitall" statements and I don't know why? It literally just waits, and I'm not sure what it's waiting for?

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

//---------------------------------------------------------------------------------------
// FUNCTION "halo_exchange"
//---------------------------------------------------------------------------------------
void halo_exchange(PREFIX **array, MPI_Comm topology_comm,              \
       int nn, int S_neighbor, int N_neighbor)
{
  int halo = 2;
  int M = 20;

  ...

  double *S_Recv,*N_Recv;
  double *S_Send,*N_Send;

  // Receive buffers
  S_Recv = (double *) calloc( M*halo,sizeof(double) );
  N_Recv = (double *) calloc( M*halo,sizeof(double) );

  // Send buffers
  S_Send = (double *) calloc( M*halo,sizeof(double) );
  N_Send = (double *) calloc( M*halo,sizeof(double) );

  ...
  // send buffers filled with data
  // recv buffers filled with zeros (is this ok...or do I need to use malloc?)
  ...

  if (S_neighbor == MPI_PROC_NULL)
  {
  MPI_Status status[2];
  MPI_Request req[2];

  MPI_Isend(&N_Send,halo*M,MPI_DOUBLE,N_neighbor,2,topology_comm,&req[0]);
  MPI_Irecv(&N_Recv,halo*M,MPI_DOUBLE,N_neighbor,2,topology_comm,&req[1]);
      ...
      ...
  MPI_Waitall(2,req,status);

  }
  else if (N_neighbor == MPI_PROC_NULL)
  {
  MPI_Status status[2];
  MPI_Request req[2];

  MPI_Isend(&S_Send,halo*M,MPI_DOUBLE,S_neighbor,1,topology_comm,&req[0]);
  MPI_Irecv(&S_Recv,halo*M,MPI_DOUBLE,S_neighbor,1,topology_comm,&req[1]);
      ...
      ...
  MPI_Waitall(2,req,status);

  }
  else
  {
  MPI_Status status[4];
  MPI_Request req[4];

  MPI_Isend(&S_Send,halo*M,MPI_DOUBLE,S_neighbor,1,topology_comm,&req[0]);
  MPI_Isend(&N_Send,halo*M,MPI_DOUBLE,N_neighbor,2,topology_comm,&req[1]);

  MPI_Irecv(&N_Recv,halo*M,MPI_DOUBLE,N_neighbor,2,topology_comm,&req[2]);
  MPI_Irecv(&S_Recv,halo*M,MPI_DOUBLE,S_neighbor,1,topology_comm,&req[3]);
      ...
      ...
  MPI_Waitall(4,req,status);

  }
  ...
}

This was my original understanding, which is obviously missing something: Since each process calls this subroutine, all send/recv functions are called. Then all processes will wait at their MPI_Waitall point for the corresponding communications to take place. When they are done it moves on....can someone tell me why mine isn't moving??? Also I'm not too clear on the "tag" argument (clue?) Thanks for all your help in advance!!!

Jonathan Dursi Jonathan Dursi · Accepted Answer · 2013-03-23T15:23:00

This body of code

  MPI_Status status[4];
  MPI_Request req[4];

  MPI_Isend(&S_Send,halo*M,MPI_DOUBLE,S_neighbor,1,topology_comm,&req[0]);
  MPI_Isend(&N_Send,halo*M,MPI_DOUBLE,N_neighbor,2,topology_comm,&req[1]);

  MPI_Irecv(&N_Recv,halo*M,MPI_DOUBLE,N_neighbor,2,topology_comm,&req[2]);
  MPI_Irecv(&S_Recv,halo*M,MPI_DOUBLE,S_neighbor,1,topology_comm,&req[3]);
      ...
      ...
  MPI_Waitall(4,req,status);

is mostly fine, and you shouldn't have to if around the MPI_PROC_NULL neighbours; that's what MPI_PROC_NULL is for, so that you can just push the corner cases into the MPI routines themselves greatly simplifying the developer-facing communications code.

The issue here is in fact the tags. Tags are attached to individual messages. The tags can be any non-negative integer up to a certain max, but the key is that the sender and the receiver have to agree on the tag.

If you are sending your north neighbour some data with tag 2, that's fine, but now pretend that you're the north neighbour; you're going to receive that same message from your south neighbour with tag 2. Similarly, if you're going to send your south neighbour data with tag 1, that south neighbour is going to need to receive it from its north neighbour with tag 1.

So you actually want

  MPI_Isend(&S_Send,halo*M,MPI_DOUBLE,S_neighbor,1,topology_comm,&req[0]);
  MPI_Isend(&N_Send,halo*M,MPI_DOUBLE,N_neighbor,2,topology_comm,&req[1]);

  MPI_Irecv(&N_Recv,halo*M,MPI_DOUBLE,N_neighbor,1,topology_comm,&req[2]);
  MPI_Irecv(&S_Recv,halo*M,MPI_DOUBLE,S_neighbor,2,topology_comm,&req[3]);

Update based on OPs comment below:

and in fact, since S_Recv etc. are already pointers to the data, as:

  S_Recv = (double *) calloc( M*halo,sizeof(double) );

what you really want is:

  MPI_Isend(S_Send,halo*M,MPI_DOUBLE,S_neighbor,1,topology_comm,&req[0]);
  MPI_Isend(N_Send,halo*M,MPI_DOUBLE,N_neighbor,2,topology_comm,&req[1]);

  MPI_Irecv(N_Recv,halo*M,MPI_DOUBLE,N_neighbor,1,topology_comm,&req[2]);
  MPI_Irecv(S_Recv,halo*M,MPI_DOUBLE,S_neighbor,2,topology_comm,&req[3]);

Unknown error with exchanging halo/ghost cells using MPI (C)

2 Answers