I wish to discover the cause of an error in an MPI program. The program is a big while loop such that for each iteration, a set of message passing is done between each processor and its neighbors using ISEND and IRECV as follows:
while ( t< a very large number ) ...
do i=1,8
if ( something that is almost always true ) then
call MPI_ISEND(A,A_buffer,inewtype,neighrank(i),2,MPI_COMM_WORLD,isend,ierr)
call MPI_WAIT(isend,istatus,ierr)
call MPI_ISEND(B,B_buffer,MPI_INTEGER4,neighrank(i),3,MPI_COMM_WORLD,isend,ierr)
call MPI_WAIT(isend,istatus,ierr)
end if
end do
do i=1,8
if ( something that is almost always true) then
call MPI_IRECV(C,C_buffer,inewtype,neighrank(i),2,MPI_COMM_WORLD,irecv,ierr)
call MPI_WAIT(irecv,istatus,ierr)
call MPI_IRECV(D,D_buffer,MPI_INTEGER4,neighrank(i),3,MPI_COMM_WORLD,irecv,ierr)
call MPI_WAIT(irecv,istatus,ierr)
end if
end do
The program produces a segmentation fault
error after a very large number of iterations. At each iteration, the same amount of data are message passed among the processors, but the number of calls to ISEND and IRECV is adjustable (i.e. use 80 calls to pass 80kb total or 40 calls to pass 160kb total). If the number of calls is small the program crashes earlier.
I am suspecting that something about InfiniBand! is causing this error, but I do not get an insufficient virtual memory
- so it cannot possibly be InfiniBand? What can possibly cause this error?
immediately followed byMPI_WAIT
when a simpleMPI_SEND/RECV
would do exactly the same? Then I would recommend that you compile your program with debugging enabled and then examine the core file to find where the crash occurs. – Hristo IlievMPI_ISEND
belong to the class of non-blocking communication operations that execute in the background. They allow you to do computations while the communication takes place (e.g. betweenMPI_ISEND
), which often leads to faster overall program execution. But the operations themselves are as fast as their blocking counterparts. – Hristo IlievMPI_ISEND/RECV
also helps prevent deadlock that can occur with buffer overflow when using the regularMPI_SEND/RECV
. – bob.sacamento