I'm facing a peculiar situation.
I have a MPI program that creates 16 MPI processes as mpirun -np 16 a.out
Now I want all these 16 processes to run for a fixed time say 60 seconds after which they should all report their results to a common process (say the one with rank 0).
So I will be doing a gather from process with rank 0 after 60 seconds. Now, how do I ensure that all processes stop after 60 seconds?
Pseudocode:
/*All processes (except 0) are doing the following:*/
while (1) {
MPI_Send (to process 0)
MPI_Recv (from process 0)
}
/*Process 0 roughly does the following:*/
while(1) {
MPI_Recv (from any other process)
Process the request
MPI_Send (back to clients)
}
/* After 60 seconds, stop all processes and gather results at Process 0. */
1. Catch a SIGALRM signal after 60 secs.
2. Do dummy MPI_Irecv(any source) to ensure that it any client blocking on MPI_Send() is woken up.
3. Now do an MPI_Send to all clients with a special value in buffer telling them to terminate.
4. MPI_gather from all clients.
Process 0 acts like a server and the rest are clients.
I tried using signal handling (SIGALRM) but the documentation says that signal handling is unsafe with MPI.
If signals cannot be used, then how do we handle this?