I am communicating large vectors between processes for a numerical simulation. Everything works fine until a certain time step. I don't get errors, but the output solution is obviously incorrect.
I am debugging for quite a while now and my assumption is that there is an error in the MPI communication.
The communication part of my code looks like this:
MPI_Request req;
for(int j=0;j<numProcs;j++){
if(j!=myId){
tag=0;
sizeToSend=toProc[j].size();
MPI_Isend(&sizeToSend, 1, MPI_LONG_LONG, j, tag, MPI_COMM_WORLD,&req);
MPI_Request_free(&req);
}
}
for(int j=0;j<numProcs;j++){
if(j!=myId){
tag=0;
MPI_Recv(&sizeToReceive[j], 1, MPI_LONG_LONG, j, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
}
}
for(int j=0;j<numProcs;j++){
if(j!=myId){
if(toProc[j].size()>0){
tag=1;
MPI_Isend(&toProc[j][0], toProc[j].size(), MPI_LONG_LONG, j, tag, MPI_COMM_WORLD,&req);
MPI_Request_free(&req);
}
}
}
for(int j=0;j<numProcs;j++){
if(j!=myId){
if(sizeToReceive[j]>0){
receiveBuffer.resize(sizeToReceive[j]);
tag=1;
MPI_Recv(&receiveBuffer[0], sizeToReceive[j], MPI_LONG_LONG, j, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
for(int k=0;k<sizeToReceive[j];k++){
domain.field[receiveBuffer[k]]=1;
}
receiveBuffer.clear();
}
}
}
MPI_Barrier(MPI_COMM_WORLD);
for(int j=0;j<toProc.size();j++){
toProc[j].clear();
}
The variable numProcs
is an int containing the number of processes, myId
is an int containing the processes' rank, tag
is an int, domain.field
is a vector<char>
.
The other necessary variables are defined like this:
vector<vector <long long> > toProc;
toProc.resize(numProcs);
long long sizeToReceive[numProcs];
long long sizeToSend=0;
vector<long long> receiveBuffer;
What I am trying to do in the code above is to send the vectors toProc[j]
to process with id==j for j=0,...,numProcs-1, j!=myId
on each process.
To achieve this I am sending and receiving the sizes of these vectors in the first two for-loops and sending and receiving the actual data in the 3rd and 4th for-loop. I am using Isend because I obviously want these calls to be non-blocking.
The values in toProc[j]
are the indices, which have to be set to 1 in the vector domain.field on process j (each process has its own domain.field).
My question is: Do you see any potential for unexpected behaviour in my usage of the Isend-Recv policy.
MPI_Alltoall
andMPI_Alltoallv
. – ZulanMPI_Alltoall
, how many requests would you consider too many? The error also occurs if I only use 4 processes, can that be already too many? – Jonas