2
votes

I am trying to implement a workpool here. I am supposed to send 100 differnt to numbers in total to slave processes. Each slave process then returns something to master process and another different number is sent to the slave by the master. This continues until all the 100 iterations are over.

My program gets kind of stuck in an infinite loop which I think is due to incorrect mapping of MPI_Send and MPI_Recv. I can't figure out what I am doing wrong. I have spent quite a few hours looking into this but to no avail. I am new to MPI and programming in general. The code is given below:

if(rank == 0) {
        int i,iteration = 0, a=0,inside=0,temp=0;
        for(i = 1; i < slaves; i++) {
        MPI_Send(&iteration,1,MPI_INT,i,0,MPI_COMM_WORLD);
        MPI_Send(&a,1,MPI_INT,i,1,MPI_COMM_WORLD);
        iteration++;
    }
    while(iteration < 100+slaves){
        MPI_Recv(&temp,1,MPI_INT,MPI_ANY_SOURCE,0, MPI_COMM_WORLD, &status);
        if(iteration < 100) {
            MPI_Send(&iteration,1,MPI_INT,status.MPI_SOURCE,0,MPI_COMM_WORLD);
            MPI_Send(&a,1,MPI_INT,status.MPI_SOURCE,1,MPI_COMM_WORLD);
        }
        iteration++;
        inside = inside + temp;
    }
}
else {
    int iteration=0,count=0;
    if(iteration < 100) {
        MPI_Recv(&iteration,1,MPI_INT,0,0,MPI_COMM_WORLD,&status);
        MPI_Recv(&count,1,MPI_INT,0,1,MPI_COMM_WORLD,&status);
        MPI_Send(&count,1,MPI_INT,0,0,MPI_COMM_WORLD);
    }
}
2
Why for loop starts from 1 and not from 0: for(i = 1; i < slaves; i++) {?Anto Jurković
@AntoJurković Because the master(rank=0) is only supposed to send out data to the slaves (rank>0).Sohi

2 Answers

3
votes

You need to loop within your slave ranks as well. As it stands now, you send iteration and a from the master to the slave, send count back from the slave to the master, and then the master tries to send iteration and a from within a while loop, while the slaves have happily exited the else block and continued on their merry way. Either get rid of the while loop in the master process so that it doesn't send things the slaves will never receive, or add one in the slave processes so that they will properly receive that data.

2
votes

One of the most important things in MPI is to understand, that in general every single process executes the same programm. This makes mpi_rank to one of your best friends, since you need it to distinguish the different tasks each process has to accomplish.

Another important point to understand is how blocking/non-blocking communication in MPI works. Here we're using blocking communication (MPI_Send() and MPI_Recv()). This means a process will stop at a function call like MPI_Recv() and wait until the communication partner will reach it's "counterpart" (the MPI_Send(), to send something to me).

The fact that your program gets stuck is a good indication for not having the same amount of MPI_Send() and MPI_Recv() calls: somewhere a process is still waiting to receive a message/be able to send a message.

For your example, I'd try to do something like this:

while( iterations < 100 ){
  // in general every process has to do something for 100 times,
  // but we have to have to distinguish between master and slaves.

  if( mpi_rank == 0 ){
    // The master process...
    for( int slave_rank = 1; slave_rank < mpi_size; slave_rank++ ){
      // ... has to send, receive and send once again something to/from every(!) slave, ...
      MPI_Send( [one int to slave_rank] );
      MPI_Recv( [one int from slave_rank] );
      MPI_Send( [another int to slave_rank] );
    }
  }
  else{
    //... while the slaves just have to receive, send and receive again from/to one process (the master)
    MPI_Recv( [one int from master] );
    MPI_Send( [one int to master] );
    MPI_Recv( [another int from master] );
  }
  iterations++;
}

Your task sounded like: Master sends int to slave #1, #2, #3....., then he receives from #1, #2, #3...., then he sends another int to #1, #2, #3. You'll probably recognize that you have to loop over all the slave ranks for three times.

This solution is different (the result is the same though), but shorter: Master sends int to slave #1, then receives int from slave #1, then sends another int to slave #1. Afterwards, repeat the same thing for slave #2, #3, #4.... This way we just have to loop over all the slave ranks for just one time.