1
votes

I am dealing with a Fortran code with MPI parallelization where I'm running out of memory for intensive runs. I'm careful to allocate nearly all of the memory that is required at the beginning of my simulation. Subroutine static memory allocation is typically small, but if I was to run out of memory due to these subroutines, it would occur early in the simulation because the memory allocation should not grow as time progresses. My issue is that far into the simulation, I am running into memory errors such as:

Insufficient memory to allocate Fortran RTL message buffer, message #174 = hex 000000ae.

The only thing that I can think of is that my MPI calls are using memory that I cannot preallocate at the beginning of the simulation. I'm using mostly MPI_Allreduce, MPI_Alltoall, and MPI_Alltoallv while the simulation is running and sometimes I am passing large amounts of data. Could the memory issues be a result of internal buffers created by MPI? How can I prevent a surprise memory issue like this? Can this internal buffer grow during the simulation?

I've looked at Valgrind and besides the annoying MPI warnings, I'm not seeing any other memory issues.

1

1 Answers

0
votes

It's a bit hard to tell if MPI is at fault here without knowing more details. You can try massif (one of the valgrind tools) to find out where memory is being allocated.

Be sure you don't introduce any resource leaks: If you create new MPI resources (communicators, groups, requests etc.), make sure to release them properly.

In general, be aware of the buffer sizes required for all-to-all communication, especially at large scale. Use MPI_IN_PLACE if possible, or send data in small chunks rather than as single large blocks.