I'm new to MPI. I have 4 processes: processes 1 through 3 populate a vector and send it to process 0, and process 0 collects the vectors into one very long vector. I have code that works (too long to post), but process 0's recv operation is clumsy and very slow.
In abstract, the code does the following:
MPI::Init();
int id = MPI::COMM_WORLD.Get_rank();
if(id>0) {
double* my_array = new double[n*m]; //n,m are int
Populate(my_array, id);
MPI::COMM_WORLD.Send(my_array,n*m,MPI::DOUBLE,0,50);
}
if(id==0) {
double* all_arrays = new double[3*n*m];
/* Slow Code Starts Here */
double startcomm = MPI::Wtime();
for (int i=1; i<=3; i++) {
MPI::COMM_WORLD.Recv(&all_arrays[(i-1)*m*n],n*m,MPI::DOUBLE,i,50);
}
double endcomm = MPI::Wtime();
//Process 0 has more operations...
}
MPI::Finalize();
It turns out that endcomm - startcomm
accounts for 50% of the total time (0.7 seconds compared to 1.5 seconds for the program to complete).
Is there a better way to receive the vectors from processes 1-3 and store them in process 0's all_arrays
?
I checked out MPI::Comm::Gather, but I'm not sure how to use it. In particular, will it allow me to specify that process 1's array is the first array in all_arrays, process 2's array the second, etc.? Thanks.
Edit: I removed the "slow" loop, and instead put the following between the "if" blocks:
MPI_Gather(my_array,n*m,MPI_DOUBLE,
&all_arrays[(id-1)*m*n],n*m,MPI_DOUBLE,0,MPI_COMM_WORLD);
The same slow performance resulted. Does this have something to do with the fact that the root process "waits" for each individual receive to complete before attempting the next one? Or is that not the right way to think about it?
n
andm
in your program, and what kind of connection is there between your machines? – suszterpatt