If Statements between MPI Isend/Irecv and MPI wait prevent program from progressing. What could be causing this?

Question

I am creating a program to calculate the potential between two conductors using MPI. I am using non blocking send and receives so calculations can be done whilst information is sent between processors.

However, the if statements between the isend and irecv and waits commands, in which the calculations are contained, are not being entered. When the if statements and the calculations are removed the program proceeds to the wait statement.

I have checked that the calculations are correct and not causing issues. I have checked that the conditions for the if statements are correct.

Here is a section of test code:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include <mpi.h>

int main(int argc, char *argv[])
{
  /*MPI Specific Variables*/
  int my_size, my_rank, up, down;
  MPI_Request reqU, reqD, sreqU, sreqD;
  MPI_Status rUstatus, rDstatus, sUstatus, sDstatus;  

  /*Physical Dimensions*/
  double phi_0 = 1000.0;/*V*/

  /*Other Variables*/
  int grid_size = 100;
  int slice = 50;
  int x,y;
  double grid_res_y = 0.2;
  double grid_res_x = 0.1;
  int xboundary = 10;
  int yboundary = 25;
  int boundary_proc = 2;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
  MPI_Comm_size(MPI_COMM_WORLD, &my_size);

  /*Determining neighbours*/
  if (my_rank != 0) /*if statemets used to stop highest and lowest rank neighbours arent outside 0 - my_size-1 range of ranks*/
    {
      up = my_rank-1;
    }
  else
    {
      up = MPI_PROC_NULL;
    }

  if(my_rank != my_size-1)
    {
      down = my_rank+1;
    }
  else
   {
      down = MPI_PROC_NULL;
    }

  /*cross-check: presumed my_size is a factor of gridsize else there are odd sized slices and this is not coded for*/
  if (grid_size%my_size != 0)
    {
      printf("ERROR - number of procs =  %d, this is not a factor of grid_size %d\n", my_size, grid_size);
      exit(0);
    }

  /*Set Up Distributed Data Approach*/
  double phi[slice+2][grid_size]; /*extra 2 rows to allow for halo data*/

  for (y=0; y < slice+2; y++)
    {
      for (x=0; x < grid_size; x++)
        { 
          phi[y][x] = 0.0;
        }
    }

  if(my_rank == 0) /*Boundary Containing rank does 2 loops. One over part with inner conductor and one over part without inner conductor*/
    {
      for(y=0; y < slice+1; y++)
        {
          for(x=xboundary; x < grid_size; x++)
            {
              phi[y][x] = phi_0;
            }
        }   
    }


  if (my_rank < my_size-1)
    {
      /*send top most strip up one node to be recieved as bottom halo*/
      MPI_Isend(&phi[1][0], grid_size  , MPI_DOUBLE, down, 1, MPI_COMM_WORLD, &sreqU);  
      /*recv top halo from up one node*/
      MPI_Irecv(&phi[slice+1][0], grid_size, MPI_DOUBLE, down, 2, MPI_COMM_WORLD, &reqU);
    }

  if (my_rank > 0)
    {
      /*recv top halo from down one node*/
      MPI_Irecv(&phi[0][0], grid_size , MPI_DOUBLE, up, 2, MPI_COMM_WORLD, &reqD);
      /*send bottom most strip down one node to be recieved as top halo*/
      MPI_Isend(&phi[slice][0], grid_size , MPI_DOUBLE, up, 1, MPI_COMM_WORLD, &sreqD);
    }

  printf("send/recv complete");

  if (my_rank < boundary_proc)
     {
        printf("rank %d Entered if", my_rank);
        /*Calculations*/
     }

  else if(my_rank > boundary_proc)
    {
        printf("rank %d Entered else if", my_rank);
        /*calculations*/
    }

  else
     {
        printf("rank %d Entered else", my_rank);
        /*calculations*/
     }

  if (my_rank<my_size-1)
   {
     /*Wait for send to down one rank to complete*/
     MPI_Wait(&sreqD, &sDstatus);
     /*Wait for recieve from up one rank to complete*/
     MPI_Wait(&reqD, &rDstatus);
   }

  if (my_rank>0)
   {
     /*Wait for send to up down one rank to complete*/
     MPI_Wait(&sreqU, &sUstatus);
     /*Wait for recieve from down one rank to complete*/
     MPI_Wait(&reqU, &rUstatus);
   }

  printf("Wait complete");
  MPI_Finalize();
  return 0;
}

All the print statements should be printed with the respective ranks. Currently it only makes it up to "send/recv complete" I am only testing on 2 processors atm.

Zulan Zulan · Accepted Answer · 2019-01-25T15:12:16

Mismatching tags

Tags must match for each pair of communication operations, i.e. there must be a send and receive with the same tag. In your case, the two sends have their own tags and the receives a different one. Change it such that sending down and receiving from up have the same tag and vice versa, e.g.

if (my_rank < my_size-1) {
    /*send top most strip up one node to be recieved as bottom halo*/
    MPI_Isend(&phi[1][0], grid_size  , MPI_DOUBLE, down, 1, MPI_COMM_WORLD, &sreqU);  
    /*recv top halo from up one node*/
    MPI_Irecv(&phi[slice+1][0], grid_size, MPI_DOUBLE, down, 2, MPI_COMM_WORLD, &reqU);
}

if (my_rank > 0) {
    /*recv top halo from down one node*/
    MPI_Irecv(&phi[0][0], grid_size , MPI_DOUBLE, up, 1, MPI_COMM_WORLD, &reqD);
    /*send bottom most strip down one node to be recieved as top halo*/
    MPI_Isend(&phi[slice][0], grid_size , MPI_DOUBLE, up, 2, MPI_COMM_WORLD, &sreqD);
}

Mismatching request objects

On the border ranks, you are waiting for the wrong requests, this is just fixed by swapping the MPI_Wait if bodies.

Waiting for multiple non-blocking operations

Contrary to some of the discussion in the now-deleted answers, it is correct to wait for multiple ongoing non-blocking communications with multiple waits¹.

Nevertheless, it is strictly better to use an array of requests and MPI_Waitall. It leads to cleaner code, would have prevented the mistake of mixing requests in the first place. It also gives the MPI implementation more freedom to optimize. This can look like the following:

MPI_Request requests[MAX_REQUESTS];
int num_requests = 0;

// ...

MPI_Isend(..., requests[num_requests++]);

// ...

MPI_Waitall(num_requests, requests, statuses);

Or, you could utilize the fact that MPI_Waitall permits elements of the requests array to be MPI_REQUEST_NULL. That allows you to correlate specific requests and is eventually a matter of style.

typedef enum {
    RECV_UP, RECV_DOWN, SEND_UP, SEND_DOWN, MAX_REQUESTS
} MyRequests;

MPI_Request requests[MAX_REQUESTS];
MPI_Status statuses[MAX_REQUESTS];

if (my_rank < my_size-1) {
    /*send top most strip up one node to be recieved as bottom halo*/
    MPI_Isend(&phi[1][0], grid_size  , MPI_DOUBLE, down, 1, MPI_COMM_WORLD, &requests[SEND_DOWN]);  
    /*recv top halo from up one node*/
    MPI_Irecv(&phi[slice+1][0], grid_size, MPI_DOUBLE, down, 2, MPI_COMM_WORLD, &requests[RECV_DOWN]);
} else {
    requests[RECV_DOWN] = requests[SEND_DOWN] = MPI_REQUEST_NULL;
}

if (my_rank > 0) {
    /*recv top halo from down one node*/
    MPI_Irecv(&phi[0][0], grid_size , MPI_DOUBLE, up, 1, MPI_COMM_WORLD, &requests[RECV_UP]);
    /*send bottom most strip down one node to be recieved as top halo*/
    MPI_Isend(&phi[slice][0], grid_size , MPI_DOUBLE, up, 2, MPI_COMM_WORLD, &requests[SEND_UP]);
} else {
    requests[RECV_UP] = requests[SEND_UP] = MPI_REQUEST_NULL;
}

// ...

MPI_Waitall(MAX_REQUESTS, requests, statuses);

^{1: This is mandated by the non-blocking progress guarantee in the MPI Standard (3.7.4)}

Progress A call to MPI_WAIT that completes a receive will eventually terminate and return if a matching send has been started, unless the send is satisfied by another receive. In particular, if the matching send is nonblocking, then the receive should complete even if no call is executed by the sender to complete the send. Similarly, a call to MPI_WAIT that completes a send will eventually return if a matching receive has been started, unless the receive is satisfied by another send, and even if no call is executed to complete the receive.

If Statements between MPI Isend/Irecv and MPI wait prevent program from progressing. What could be causing this?

1 Answers

Mismatching tags

Mismatching request objects

Waiting for multiple non-blocking operations