0
votes

I have two functions with different algorithms. In the first function I implemented non-blocking communications (MPI_Irecv, MPI_Isend) and the program runs without any errors. Even when I change the non-blocking to blocking communication, everything is fine. No deadlock. But if I implement the second function with basic blocking communication like this (reduced the algorithm to the problem):

 if( my_rank == 0)
    {
      a = 3 ;
      MPI_Send(&a,1,MPI_DOUBLE,1,0,MPI_COMM_WORLD) ;
    }

    else if( my_rank == 1 )
    {
      MPI_Recv(&a,1,MPI_DOUBLE,0,0,MPI_COMM_WORLD, &status ) ;
    }

So, process 1 should receive the value a from process 0. But I'm getting this error:

Fatal error in MPI_Recv: Message truncated, error stack: MPI_Recv(187).......................: MPI_Recv(buf=0xbfbef2a8, count=1, MPI_DOUBLE, src=0, tag=0, MPI_COMM_WORLD, status=0xbfbef294) failed MPIDI_CH3U_Request_unpack_uebuf(600): Message truncated; 32 bytes received but buffer size is 8 rank 2 in job 39 Blabla caused collective abort of all ranks exit status of rank 2: killed by signal 9

If I run the program with only one of the two functions, then they work as they are supposed to. But both together results in the error message above. I do understand the error message, but I don't know what I can do to prevent it. Can someone explain to me where I have to look for the error? Since I'm not getting a deadlock in the first function, I'm assuming that there can't be a unreceived send from the first function which leads to the error in the second.

1
The issue isn't Recv vs Irecv, and it certainly isn't any deadlock. The issue is that the Rank 2's Recv() from task 0, where it's receiving a single MPI_DOUBLE, is being matched to SEND from rank 0 of size 32 bytes (4 doubles maybe?). Thus the message truncated error. So we'll need to see more code to see what's going on.Jonathan Dursi
I know that. The second function consists only of the send/recv operation. Its definitely located in the first function, because (depending on the user input) this is exactly the size of the communicated arrays. But how could that be? All the send/recv operations must be finished otherwise it would result in a deadlock. Or am i completely wrong? The whole code is about six hundred rows. Have to simplify it...could take a while.Rade

1 Answers

0
votes

So, here is the the first function:

MPI_Type_vector(m,1,m,MPI_DOUBLE, &column_mpi_t ) ;
MPI_Type_commit(&column_mpi_t) ;

T = (double**)malloc(m*sizeof(double*)) ;
T_data = (double*)malloc(m*m*sizeof(double)) ;


for(i=0;i<m;i++)
{
  T[i] = &(T_data[i*m]) ;
}

if(my_rank==0)
{
  s = &(T[0][0]) ;
  for(i=1;i<p;i++)
  {
    MPI_Send(s,1,column_mpi_t,i,0,MPI_COMM_WORLD) ;
  }
}
for(k=0;k<m-1;k++)
{
  if(k%p != my_rank)
  {
    rbuffer = &(T[0][k]) ;
    MPI_Recv(rbuffer,1,column_mpi_t,k%p,0,MPI_COMM_WORLD,&status) ;
  }

  for(j=k+1;j<n;j++)
  {
    if(j%p==my_rank)
    {
      if(j==k+1 && j!=n-1)
      {
        sbuffer = &(T[0][k+1]) ;
        for(i=0;i<p;i++)
        {
          if(i!= (k+1)%p )
            MPI_Send(sbuffer,1,column_mpi_t,i,0,MPI_COMM_WORLD) ;
        }
      }         
    }
  }
}

I came to the conclusion that the derived datatype is the origin of my problems. Somebody sees why?

Ok, im wrong. If i change the MPI datatype in MPI_Irecv/send to MPI_DOUBLE,that would fit to the datatypes of recv/send of the second function ..so no truncation error. So, no solution....