0
votes

So I'm trying to parallelize a Fortran 2003 program I have using MPI. I run it on node of 16 processor cores with 64Gb of shared memory. The 16 ranks within my MPI communicator have to apply some algorithm to part of a rather large array (about 6000 by 8000 elements, double precision).

Right now, I use MPI_BCAST to send copies of this array from the root rank to the other 15 ranks, which takes a lot of time. The big array is read-only so I figured it'd be faster to make it open to reading by all ranks using MPI_win_allocate_shared and just pass the win object. Since it is a shared-memory node, this should work fine. (All based on the explanation I found in an answer to this topic: MPI Fortran code: how to share data on node via openMP?)

However, when I try to compile the program, I get the following error message:

Undefined symbols for architecture x86_64:
 "_mpi_win_allocate_shared_", referenced from:
      ___lakefinder_parallel_mpi_nam_module3_MOD_fill_sea_master in lakefinder_parallel_mpi_nam_module3.o
ld: symbol(s) not found for architecture x86_64
collect2: error: ld returned 1 exit status

Not a clue why, since the program uses several other MPI commands (MPI_INIT, MPI_SEND, MPI_BCAST, etc.) and it works fine with those.

Any ideas?

Here's the basic code I have (only the relevant pieces, it's part of a large climate model which I won't bother you with):

PROGRAM Main_program

USE mpi
USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR, C_F_POINTER
USE configuration_main_module, only: dp  ! 8-byte float

IMPLICIT NONE

REAL(dp), DIMENSION(6000,8000) :: data_array
INTEGER                        :: ierr, rank, size, win, disp_unit
INTEGER(KIND=MPI_ADDRESS_KIND) :: windowsize
TYPE(C_PTR)                    :: baseptr

! Split program into ranks
CALL MPI_INIT(ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)

! Open up a lot of space for the root rank and nothing for the others
IF (rank==0) THEN
  windowsize = 6000*8000*8_MPI_ADDRESS_KIND
ELSE
  windowsize = 0
END IF

disp_unit = 1
CALL MPI_Win_allocate_shared(windowsize, disp_unit, MPI_INFO_NULL, MPI_COMM_WORLD, baseptr, win, ierr)

CALL MPI_FINALIZE(ierr)

END PROGRAM Main_program
1
MPI_WIN_ALLOCATE_SHARED is part of MPI-3.0. You need an implementation that covers that version of the standard or later. - Hristo Iliev
Thanks, that might be it then. I just installed openmpi-2.0.0 but apparently uninstalling the 1.10.3 version didn't work correctly (now, even when I uninstall both the old version of my code still compiles so there's a version lying around somewhere). - user6593631
Your code is not Fortran 90. The intrinsic module iso_c_binding was defined much later in Fortran 2003. - jlokimlin
Open MPI 1.10.3 does support MPI-3.0. Your problem is something else then. - Hristo Iliev

1 Answers

0
votes

The following code compiles and runs without error with Intel Fortran 16.0.2 and MPICH 3.2. Notice that I touch the buffer allocated and free the window before exiting. I also chose an array dimension that wouldn't use a huge amount of memory.

You might try your test with smaller array dimensions and scale up until it fails, to see if the issue is associated with inadequate shared memory resources, which is quite common with the Sys5 shared memory API, but not the POSIX shared memory API.

If not a resource issue, it is likely a bug in Open-MPI. Please report the bug to Open-MPI and switch to MPICH. For what it's worth, I haven't seen any MPI-3 shared memory issues with Open-MPI, but I don't test it as much as others, and I never use RMA from Fortran.

PROGRAM Main_program

USE mpi
USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR, C_F_POINTER
use iso_fortran_env

IMPLICIT NONE

REAL(kind=REAL64), DIMENSION(60,80) :: data_array
INTEGER                        :: ierr, rank, size, win, disp_unit
INTEGER(KIND=MPI_ADDRESS_KIND) :: windowsize
TYPE(C_PTR)                    :: baseptr

! Split program into ranks
CALL MPI_INIT(ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)

! Open up a lot of space for the root rank and nothing for the others
IF (rank==0) THEN
      windowsize = 60*80*8_MPI_ADDRESS_KIND
ELSE
      windowsize = 0
END IF

disp_unit = 1
CALL MPI_Win_allocate_shared(windowsize, disp_unit, MPI_INFO_NULL, MPI_COMM_WORLD, baseptr, win, ierr)
if (ierr .ne. 0) call MPI_Abort(MPI_COMM_WORLD, ierr, -1)
CALL MPI_Win_lock_all(0, win, ierr)
if (rank .eq. 0 ) data_array = 0
CALL MPI_Win_unlock_all(win, ierr)
CALL MPI_Win_free(win, ierr)

CALL MPI_FINALIZE(ierr)

END PROGRAM Main_program