1
votes

Is there a way to check if some processes are waiting on MPI_Recv?

I have a root proc, and some slave processes.

Slave psedo-code:

while (1) {
    do_some_stuff; // calls MPI_Test and clear unused buffers
    MPI_Recv(buf, ...);
    do_something_with_buf;
    MPI_Isend(buf2, ...); // possibly many sends depending on what was in buf
}

If all slave processes hang on MPI_Recv, then job is done and I need to brake the loop. Now I need some way to notify slave processes that job is done. Is there any way to do this? I thought there might be something like reverse probe to check if anyone waits for message instead of checking if there is a message to recieve. Haven't found anything useful tho.

Edit: some more explanation.

I have one root proc, which reads a huge file and sends read data to workers(rest of processes). Each worker recieves a portion of data, so its well distributed(each worker has roughly same amount of data stored). Then those workers start to communicate with each other sending partial computations. When a worker recieves a partial computation it may produce a lot of new partial results, some of which need to be sent to other workes. The work is done when all workers have nothing to do and there are no more partial results waiting to be recieved.

1
when the work is finished the root can send a special stop message to all and if they are finished the will send finished message to root and again wait for reply, when root receives that all are finished it will send a final finish to all using which all can break the loop . - cruxion effux
The problem is that individual process doesn't know if it finnished. The work is done if all slave processes have nothing more to send(ie. they all hang on recv). I had an idea to but some send before and after recv to notify root that process is waiting for message(and that it recieved something) and if all processes are waiting then send stop message to all. But this approach makes a lot of messages from and to root, which seems to be very inefficient. - Borys Popławski
Polawski If the slave doesn't know its finished then , it can't do anything to alert the root and others. Right ? So the only option left is to send something from server and check ? But lets say when server will send for check ( or stop)they may not finished and thus the stop will cause them to die without finishing work ? So I think both ways we are blocked ? - cruxion effux
Yes your approach may be causing a communication but these are essential if you need synchronization , the slaves or processes in MPI are assumed to be separately running and there's no shared memory and such thing that will help us know status about other processes and thus communication is the only way . - cruxion effux
Yes, slave processes can't alert root they finnished, but root can alert them. Here is the whole problem: how can root know when its all done? The only answer I found is to check if all processes are waiting for messages(in such case all sent data have been processed). - Borys Popławski

1 Answers

0
votes

You should be able to avoid the situation where there would be a receive expected but nothing sent. The sending processor, in a master slave type situations, should always be keeping track of how much work there is to send. Typically this master slave strategy would work with the master keeping track and killing off the slaves once the total is reached...

In terms of functions, the closest equivalent to a probe on the send side may be to use a non-blocking send MPI_isend, which returns a status that can be passed to something like MPI_test, which is non-blocking and will return MPI_SUCCESS for a message has been received successfully. You can also use MPI_Wait with the status if you want to block the sending code until the message has been received. Using test/wait with unique tags for each send to each processes will be a way to perform what you want.