I'd like the following behavior when running an MPI script with mpi4py: when any process throws an exception, mpirun (and its spawned processes) should immediately exit with non-zero error codes. But instead, I find that execution continues even if one or more processes throws an exception.
I am using mpi4py 3.0.0 with OpenMPI 2.1.2. I'm running this script with
mpirun --verbose -mca orte_abort_on_non_zero_status 1 -n 4 python my_script.py
. I expected this to immediately end before the sleep is hit, but instead, processes with ranks != 0 sleep:
import time
import mpi4py
def main():
import mpi4py.MPI
mpi_comm = mpi4py.MPI.COMM_WORLD
if mpi_comm.rank == 0:
raise ValueError('Failure')
print('{} continuing to execute'.format(mpi_comm.rank))
time.sleep(10)
print('{} exiting'.format(mpi_comm.rank)
if __name__ == '__main__':
main()
How can I get the behavior I'd like (fail quickly if any process fails)?
Thank you!