1
votes

I got a problem with a MPI code in C.

I think I created the good algorithm to process a double loop with a 2D array. But, when i try to use MPI_Gather to collect datas from process, there is a segmentation fault error. Here is the code :

#define NN 4096
#define NM 4096

double global[NN][NM];

void range(int n1, int n2, int nprocs, int irank, int *ista, int *iend){
    int iwork1;
    int iwork2;
    iwork1 = ( n2 - n1 + 1 ) / nprocs;
    iwork2 = ( ( n2 - n1 + 1 ) % nprocs );
    *ista = irank * iwork1 + n1 + fmin(irank, iwork2);
    *iend = *ista + iwork1 - 1;
    if ( iwork2 > irank ) 
        iend = iend + 1;
}

void runCalculation(int n, int m, int argc, char** argv)
{
    const int iter_max = 1000;

    const double tol = 1.0e-6;
    double error     = 1.0;

    int rank, size;
    int start, end;

    MPI_Init( &argc, &argv );

    MPI_Comm_rank( MPI_COMM_WORLD, &rank );
    MPI_Comm_size( MPI_COMM_WORLD, &size );

    if (size != 16) MPI_Abort( MPI_COMM_WORLD, 1 );

    memset(global, 0, n * m * sizeof(double));

    if(rank == 0){
        for (int j = 0; j < n; j++)
        {
            global[j][0] = 1.0;
        }
    }

    int iter = 0;

    while ( error > tol && iter < iter_max )
    {
        error = 0.0;

        MPI_Bcast(global, NN*NM, MPI_DOUBLE, 0, MPI_COMM_WORLD); 

        if(iter == 0)
            range(1, n, size, rank, &start, &end);

        int size = end - start;

        double local[size][NM];
        memset(local, 0, size * NM * sizeof(double));

        for( int j = 1; j < size - 1; j++)
        {   
            for( int i = 1; i < m - 1; i++ )
            {   
                local[j][i] = 0.25 * ( global[j][i+1] + global[j][i-1]
                                + global[j-1][i] + global[j+1][i]);
                error = fmax( error, fabs(local[j][i] - global[j][i]));
            }
        }

        MPI_Gather(&local[0][0], size*NM, MPI_DOUBLE, &global[0][0], NN*NM, MPI_DOUBLE, 0, MPI_COMM_WORLD);

        printf("%d\n", iter);

        if(iter % 100 == 0) 
            printf("%5d, %0.6f\n", iter, error);

        iter++;
    }

    MPI_Finalize();

}

I run this with 4096x4096 arrays. With the process rank 0, it creates a segmentation fault at the MPI_Gather line. I checked if the size are ok for local arrays and I think it works nicely.

Edit : Added the line of local initialization. New segmentation fault :

*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: 0x10602000
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 19216 on machine_name exited on signal 11 (Segmentation fault).
1
Your local array is not initialized. That may be causing the trouble.Mirakurun
Oh thank you ! I added the line. Now, i still got problems with segmentation faults but on other ranks.Vincent ROSSIGNOL
Can you please tell me the value of int size? It could be over 4096 and thus overflowing the array.Mirakurun
Oh no, size = 16. I added a test to be sure : if (size != 16) MPI_Abort( MPI_COMM_WORLD, 1 );Vincent ROSSIGNOL

1 Answers

0
votes

The recvcount parameter of MPI_Gather indicates the number of items it receives from each process, not the total number of items it receives.

MPI_Gather(&local[0][0], size*NM, MPI_DOUBLE, &global[0][0], NN*NM, MPI_DOUBLE, 0, MPI_COMM_WORLD);

Should be:

MPI_Gather(&local[0][0], size*NM, MPI_DOUBLE, &global[0][0], size*NM, MPI_DOUBLE, 0, MPI_COMM_WORLD);