1
votes

I'm running mpi4py 2.0.0 built against OpenMPI 1.10.1 on an Ubuntu 14.04.3 system with Python .7.10. For some reason, attempting to send messages larger than 64 Kb causes the send/recv to hang; however, I'm able to successfully send large messages on other Ubuntu 14 systems using the exact same software and OpenMPI/mpi4py packages. I'm also able to successfully send large messages within C programs that use OpenMPI. This suggests that there is something in the environment that is adversely affecting the MPI communication performed by mpi4py. Any ideas as to what could be interfering with mpi4py?

Here is an example of the code that works on one system and hangs on the other when N is set to 65537 or greater.

import os
import sys

from mpi4py import MPI
import numpy as np

N = 65537

def worker():
    comm = MPI.Comm.Get_parent()
    size = comm.Get_size()
    rank = comm.Get_rank()

    buf = np.empty(N, np.byte)
    comm.Recv(buf=buf)

if __name__ == '__main__':
    script_file_name = os.path.basename(__file__)
    if MPI.Comm.Get_parent() != MPI.COMM_NULL:
        worker()
    else:
        comm = MPI.COMM_SELF.Spawn(sys.executable,
                        args=[script_file_name],
                        maxprocs=1)

        comm.Send(np.random.randint(0, 256, N).astype(np.byte), 0)

I also tried replacing the pickled send/recv with a non-pickled Send/Recv using explicitly specified fixed-length buffers, but that didn't have any effect on the problem.

Curiously, the problem doesn't appear to affect transmissions between peer processes using the same communicator.

1
Weird. Is the C code that works an equivalent of the Python one, i.e. does it also use MPI_Comm_spawn? Are all processes running on the same host? It would help if you could attach to each Python process with GDB and produce a stack trace of the main thread.Hristo Iliev
Problem caused by a virtual network interface, as you observed in response to a similar issue elsewhere on SO :-)lebedov

1 Answers

1
votes

Problem solved: OpenMPI was getting confused by the presence of a virtual network interface created by Docker. Removing the interface made the weirdness go away, although one can also tell OpenMPI to ignore the interface.