MPI all-to-all communication issue

Question

I am communicating large vectors between processes for a numerical simulation. Everything works fine until a certain time step. I don't get errors, but the output solution is obviously incorrect.

I am debugging for quite a while now and my assumption is that there is an error in the MPI communication.

The communication part of my code looks like this:

MPI_Request req;
for(int j=0;j<numProcs;j++){
    if(j!=myId){
        tag=0;
        sizeToSend=toProc[j].size();
        MPI_Isend(&sizeToSend, 1, MPI_LONG_LONG, j, tag, MPI_COMM_WORLD,&req);
        MPI_Request_free(&req);
    }
}
for(int j=0;j<numProcs;j++){
    if(j!=myId){
        tag=0;
        MPI_Recv(&sizeToReceive[j], 1, MPI_LONG_LONG, j, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
    }
}
for(int j=0;j<numProcs;j++){
    if(j!=myId){
        if(toProc[j].size()>0){
            tag=1;
            MPI_Isend(&toProc[j][0], toProc[j].size(), MPI_LONG_LONG, j, tag, MPI_COMM_WORLD,&req);
            MPI_Request_free(&req);
        }
    }
}
for(int j=0;j<numProcs;j++){
    if(j!=myId){
        if(sizeToReceive[j]>0){
            receiveBuffer.resize(sizeToReceive[j]);
            tag=1;
            MPI_Recv(&receiveBuffer[0], sizeToReceive[j], MPI_LONG_LONG, j, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
            for(int k=0;k<sizeToReceive[j];k++){
                domain.field[receiveBuffer[k]]=1;
            }
            receiveBuffer.clear();
        }
    }
}
MPI_Barrier(MPI_COMM_WORLD);
for(int j=0;j<toProc.size();j++){
    toProc[j].clear();
}

The variable numProcs is an int containing the number of processes, myId is an int containing the processes' rank, tag is an int, domain.field is a vector<char>. The other necessary variables are defined like this:

vector<vector <long long> > toProc;
toProc.resize(numProcs);
long long sizeToReceive[numProcs];
long long sizeToSend=0;
vector<long long> receiveBuffer;

What I am trying to do in the code above is to send the vectors toProc[j] to process with id==j for j=0,...,numProcs-1, j!=myId on each process. To achieve this I am sending and receiving the sizes of these vectors in the first two for-loops and sending and receiving the actual data in the 3rd and 4th for-loop. I am using Isend because I obviously want these calls to be non-blocking.

The values in toProc[j] are the indices, which have to be set to 1 in the vector domain.field on process j (each process has its own domain.field).

My question is: Do you see any potential for unexpected behaviour in my usage of the Isend-Recv policy.

I don't see an immediate issue, besides maybe spamming too many ongoing requests, but it seems you could greatly simplify and speed up the whole operation via an MPI_Alltoall and MPI_Alltoallv. — Zulan
Thanks for the suggestion, I will try to implement the same behaviour using MPI_Alltoall, how many requests would you consider too many? The error also occurs if I only use 4 processes, can that be already too many? — Jonas
Seems I overlooked the quite obvious issue, please see my answer. — Zulan
Four requests would be no issue. But if you want a scalable application, you should use the collectives provided by MPI if you are doing stuff with groups of processes. Alot of optimization goes into those collective implementations — Zulan

Zulan Zulan · Accepted Answer · 2016-03-14T15:14:17

You are reusing a variable for multiple ISend requests without waiting for completion.

MPI Standard: 3.7.2 and 3.7.4 about MPI_Request_free

A nonblocking send call indicates that the system may start copying data out of the send buffer. The sender should not modify any part of the send buffer after a nonblocking send operation is called, until the send completes.

This means, you must not overwrite sizeToSend before the send completes.

Mark the request object for deallocation and set request to MPI_REQUEST_NULL. An ongoing communication that is associated with the request will be allowed to complete. The request will be deallocated only after its completion.

This means, the send is not guaranteed to complete after MPI_Request_free.

You could restructure your code to keep sizeToSend in a vector, and also keep the open requests in a vector to properly MPI_Waitall on them. But I would advise to just use MPI_Alltoall and MPI_Alltoallv for the whole operation.

MPI all-to-all communication issue

1 Answers