0
votes

We have installed mpi4py and petsc using the Ananconda python environment. It works fine to run mpi for the test code of mpi4py and petsc likes

$ mpirun -n 4 python ./test.py

But when we run the test mpi code of OpenMdao v2.2.0, we always get errors likes

$ mpirun -n 4 python ./test_proc_alloc.py

ERROR: test_4_subs_max2 (__main__.ProcTestCase5)

----------------------------------------------------------------------
Traceback (most recent call last):
File "./test_proc_alloc.py", line 186, in test_4_subs_max2 p = _build_model(nsubs=4, max_procs=[2,2,2,2])  File "./test_proc_alloc.py", line 47, in _build_model p.setup(vector_class=vector_class, mode=mode, check=False)
File "anaconda2/5.0.0/lib/python2.7/site-packages/openmdao/core/problem.py", line 409, in setup model._setup(comm, 'full', mode)
File "anaconda2/5.0.0/lib/python2.7/site-packages/openmdao/core/system.py", line 714, in _setup.self._setup_var_sizes(recurse=recurse)
File "anaconda2/5.0.0/lib/python2.7/site-packages/openmdao/core/group.py", line 466, in _setup_var_sizes subsys._setup_var_sizes(recurse)
File "anaconda2/5.0.0/lib/python2.7/site-packages/openmda /core/component.py", line 233, in _setup_var_sizes    self.comm.Allgather(sizes[type_][iproc, :], sizes[type_])
File "MPI/Comm.pyx", line 640, in mpi4py.MPI.Comm.Allgather (src/mpi4py.MPI.c:98562) Exception: Invalid buffer pointer, error stack: PMPI_Allgather(1093): MPI_Allgather(sbuf=0x5629c3c809e8, scount=1, MPI_LONG, rbuf=0x5629c3c809e0, rcount=1, MPI_LONG, MPI_COMM_WORLD) failed

PMPI_Allgather(1026): Buffers must not be aliased

What is the error? Thanks.

1
could you post the content of test.py file? - Justin Gray
The test.py is just copied from petsc4py-3.8.0/demo likes kspsolve/test_mat_cg.py. I can get same iterations and residuals with mpi and serial. - jigo3635
its possible, though unlikely, that there was a change in petsc3.8 that isn't currently compatible with OpenMDAO. Could you try using 3.7? - Justin Gray

1 Answers

0
votes

I'm getting the exact same error running with OpenMDAO v2.6 with MPI4py v3.0.0, PETSc v3.8.1 built on Intel MPI. The issue appears to be that on newer versions of MPI4py it is not allowed to send and receive data in the same buffer (in this case in the AllGather function). The easy fix is to deep copy the snd buffer, which seems to fix things. I've tested this on one of our own workflows, and tried to run some of the OpenMDAO test scripts in parallel which now seem to work.

You can see the MR here: https://github.com/OpenMDAO/OpenMDAO/pull/904. Please try to checkout the branch on my fork and see if it also fixes your issue.