Code stuck when solving Poisson equation with MPI

Question

I implemented a simple 1D Poisson equation parallel solver with MPI so familiarize myself with the MPI library. I designed the code to run with an unspecified number of processors (including with just 1).

The code runs and yields good results when ran on 1 or 2 processors. However, it gets stuck on the mpi_send and mpi_recv calls with 4 processors. Therefore, I expect my implementation of the exchange of ghost points is wrong.

As the code is too large to include here, I've only included the Jacobi scheme and the exchange of data:

do iter=1,max_iter                                                                                    
    !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             
    ! Initial guess, on interior points only                                                          
    Ujacob(min_x+1:max_x-1) = 0._dp                                                                   
    Ujacob_all(0:grid_nx-1) = 0._dp                                                                   

    !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             
    ! Store solution vector from last iteration                                                       
    Uold    (:) = Ujacob    (:)                                                                       
    Uold_all(:) = Ujacob_all(:)                                                                       

    !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             
    ! Jacobi scheme                                                                                   
    do ii=min_x+1,max_x-1                                                                             
        !Ujacob(ii) = 0.5_dp * (Uold  (ii-1) + Uold  (ii+1) - grid_delta_x**2 * Urhs(ii))             
    end do                                                                                            

    !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             
    ! Gather Ujacob vector                                                                            
    call mpi_allgather(Ujacob(0:proc_nx-1), proc_nx, mpi_float,                         &             
                    &  Ujacob_all,          proc_nx, mpi_float, mpi_comm_world, ierror)               

    !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             
    ! Compute error and check if less than tolerance value                                            
    error = sqrt((sum(Ujacob_all - Uold_all)**2) / dble(grid_nx))                                     

    if(error < error_tol) return                                                                      
    !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             
    ! Exchange data points                                                                            

    ! Interior processors                                                                         
    if(Xsrc /= -1 .AND. Xdes /= -1) then
        call mpi_send(Ujacob(        0), 1, mpi_float, Xsrc, 200, mpi_comm_world, ierror)         
        call mpi_send(Ujacob(proc_nx-1), 1, mpi_float, Xdes, 100, mpi_comm_world, ierror)         

        call mpi_recv(Ujacob(     -1), 1, mpi_float, Xsrc, 100, mpi_comm_world, stat, ierror)     
        call mpi_recv(Ujacob(proc_nx), 1, mpi_float, Xdes, 200, mpi_comm_world, stat, ierror)
    ! First processor                                                                             
    elseif(Xsrc == -1) then                                                                    
        call mpi_send(Ujacob(proc_nx-1), 1, mpi_float, Xdes, 100, mpi_comm_world, ierror)         
        call mpi_recv(Ujacob(proc_nx  ), 1, mpi_float, Xdes, 200, mpi_comm_world, stat, ierror)
    ! Last processor                                                                              
    elseif(Xdes == -1) then                                                                   
        call mpi_send(Ujacob( 0), 1, mpi_float, Xsrc, 200, mpi_comm_world, ierror)                
        call mpi_recv(Ujacob(-1), 1, mpi_float, Xsrc, 100, mpi_comm_world, stat, ierror)          
    end if                                                                                                           
end do

Xsrc and Xdes are set the following way:

   !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
    ! Setting the source and destination neighbors of each processor                              
    if(myid == 0) then                                                                            
        Xsrc = -1                                                                                 
        Xdes = myid + 1                                                                           
    elseif(myid == nprocs-1) then                                                                 
        Xsrc = myid -1                                                                            
        Xdes = -1                                                                                 
    else                                                                                          
        Xsrc = myid - 1                                                                           
        Xsrc = myid + 1                                                                           
    end if

Also, I have checked that the processor rank 0 and nprocs-1 indeed correspond to the left and right bounded processors.

I have checked that the tags are well set. Also, feel free to comment on anything which you feel may be improved.

I think it may be useful to see how you set Xsrc and Xdes. — d_1999
Your halo exchange is conceptually flawed as you rely on MPI_SEND being buffered, which might not always be the case. Use MPI_SENDRECV to send and receive at the same time without blocking. Also, don't use such a complex logic for sending and receiving. Simply use MPI_PROC_NULL instead of -1 for the non-existing neighbours of the boundary ranks and always execute the sendrecv in both directions. Sending to MPI_PROC_NULL or receiving from it is a no-op. — Hristo Iliev
I think you've got a typo in your last branch from your edit. — d_1999
Computation is fast, communication - not so. You are not about to see any speedup with small arrays. — Hristo Iliev

David Henty David Henty · Accepted Answer · 2016-08-04T08:48:17

@Hristo is correct that your code is conceptually flawed in principle. However, almost every MPI implementation will buffer MPI_Send for a message containing a single real (though of course this is not guaranteed) so this is not the issue with your code.

I think you have mismatched your tags - the edge cases should have the tags reversed:

  elseif(Xsrc == -1) then                                                                    
        call mpi_send(Ujacob(proc_nx-1), 1, mpi_float, Xdes, 200, mpi_comm_world, ierror)         
        call mpi_recv(Ujacob(proc_nx  ), 1, mpi_float, Xdes, 100, mpi_comm_world, stat, ierror)
    ! Last processor                                                                              
    elseif(Xdes == -1) then                                                                   
        call mpi_send(Ujacob( 0), 1, mpi_float, Xsrc, 100, mpi_comm_world, ierror)                
        call mpi_recv(Ujacob(-1), 1, mpi_float, Xsrc, 200, mpi_comm_world, stat, ierror)          
    end if

A few other comments on the code:

it is very inefficient to compute the error term with allgather: you should sum up over the local elements only and then compute the global error with MPI_Allreduce;
you should use MPI_REAL not MPI_FLOAT for a Fortran code;
I do not see how our code can run on a single process - here the process will execute the first elseif clause then try and send to a non-existent rank.

Once you have checked that your tags are now correct, you should then fix the issues pointed out by @Hristo.

Code stuck when solving Poisson equation with MPI

1 Answers