1
votes

I have an MPI program in which worker ranks (rank != 0) make a bunch of MPI_Send calls, and the master rank (rank == 0) receives all these messages. However, I run into a Fatal error in MPI_Recv - MPI_Recv(...) failed, Out of memory.

Here is the code that I am compiling in Visual Studio 2010. I run the executable like so:

mpiexec -n 3 MPIHelloWorld.exe

int main(int argc, char* argv[]){
    int numprocs, rank, namelen, num_threads, thread_id;
    char processor_name[MPI_MAX_PROCESSOR_NAME];

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Get_processor_name(processor_name, &namelen);

    if(rank == 0){
        for(int k=1; k<numprocs; k++){
            for(int i=0; i<1000000; i++){
                double x;
                MPI_Recv(&x, 1, MPI_DOUBLE, k, i, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
            }
        }
    }
    else{
        for(int i=0; i<1000000; i++){
            double x = 5;
            MPI_Send(&x, 1, MPI_DOUBLE, 0, i, MPI_COMM_WORLD);
        }
    }
}

If I run with only 2 processes, the program does not crash. So it seems like the problem is when there is an accumulation of the MPI_Send calls from a third rank (aka a second worker node).

If I decrease the number of iterations to 100,000 then I can run with 3 processes without crashing. However, the amount of data being sent with one million iterations is ~ 8 MB (8 bytes for double * 1000000 iterations), so I don't think the "Out of Memory" is referring to any physical memory like RAM.

Any insight is appreciated, thanks!

1
For that particular question it is extremely important to know which MPI implementation you are using and in what configuration. - Zulan
Using MS-MPI v 7.1 on Windows 7 - Alyshan Jahani

1 Answers

1
votes

The MPI_send operation stores the data on the system buffer ready to send. The size of this buffer and where it is stored is implementation specific (I remember hearing that this can even be in the interconnects). In my case (linux with mpich) I don't get a memory error. One way to explicitly change this buffer is to use MPI_buffer_attach with MPI_Bsend. There may also be a way to change the system buffer size (e.g. MP_BUFFER_MEM system variable on IBM systems).

However that this situation of unrequited messages should probably not occur in practice. In your example above, the order of the k and i loops could be swapped to prevent this build up of messages.