0
votes

My program can generate error messages on certain MPI ranks when an error condition is met. However, it is possible that this condition is met only on some (but not all) ranks. I want to output a message from the first rank that encounters the error condition, and throw away similar messages from the other ranks.

If I did this naively (without throwing away messages), e.g.

if (error) cout << "Error on rank " << rank << endl;

I would get randomly ordered output on the screen.

I want to lock stdout for the first process that enters the if(error) block, which is complicated by the fact that not all processes enter that block. This means, an MPI_Barrier() collective would not complete. Sending all output to one processor only is not really a solution, since this would require synchronization at every place in the code where an error message may be generated, and hence slow down the program. Setting apart an idle processor for the sole purpose of printing out message does not seem attractive, either (this is for a community code). Writing to one file per rank is also not an option, at least, if there are many ranks.

I was wondering if there is an atomic mechanism in MPI (I read it's there in MPI3), so that I could update a flag in one processor's memory atomically, e.g. via one-sided communication, and only proceed with printing the error message when the flag is not yet set.

I'm afraid this cannot easily be accomplished with standard tricks... am I right?

UPDATE:

I think I figured out how to do it. Wesley's answer was close, but it can also be done with standard MPI2 RMA, which is available in most MPI implementations. The key to the solution can be found in the atomic example from the Using MPI2 book, the code of which is also in the MPICH2 distribution (test/mpi/rma/fetchandadd.c)

Here's how you lock and atomically increment a variable (which exists on rank 0):

if (error)
    {
    int one = 1;
    int flag;
    MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, 0, win);
    MPI_Accumulate(&one, 1, MPI_INT, 0, 1, MPI_INT, MPI_SUM, win);
    MPI_Get(&flag, 1, MPI_INT, 0, 0, 1, MPI_INT, win);
    MPI_Win_unlock(0, win);
    if (flag ==1) cout << "Error on rank " << rank << endl;
    }

and somewhere during initialization:

int error_flag = 0;
MPI_Win_create(&error_flag, sizeof(int), sizeof(int), MPI_INFO_NULL, mpi_comm, &win);

... and before exiting

MPI_Win_free(&win);
2
Beware when using RMA with Open MPI as it comes with some preconfigured bugs.Hristo Iliev

2 Answers

1
votes

You can try an MPI_COMPARE_AND_SWAP.

MPI_COMPARE_AND_SWAP(origin_addr, compare_addr, result_addr, datatype, target_rank, target_disp, win)
IN  origin_addr      initial address of buffer (choice)
IN  compare_addr     initial address of compare buffer (choice)
OUT result_addr      initial address of result buffer (choice)
IN  datatype         datatype of the element in all buffers (handle)
IN  target_rank      rank of target (non-negative integer)
IN  target_disp      displacement from start of window to beginning of target buffer (non-negative integer)
IN  win              window object (handle)

It's found on the MPI-3.0 Standard page 430 (there isn't an HTML version of the 3.0 standard so I can't post a link directly to it. With this, you compare a known value to a target value and if they are the same, you swap them and get the original value back. I'm not a total RMA expert so I can't guarantee that it will provide the sort of semantics you're looking for with total synchronization (there's some trickiness with epochs that I'm not 100% on), but I think it should work for you.

0
votes

Instead of listening for errors on a separate idle process, look into the feasability of doing the same on an idle thread on process 0. This isn't guaranteed to work (the MPI standard says nothing about thread safety, but your implementation's documentation might), but as long as your logger thread keeps away from your main thread's memory, I'd say your chances are pretty good.