1
votes

I am a beginner in MPI and this code seems to generate a segmentation fault.

int luDecomposeP(double *LU, int n)
{
    int i, j, k;
    int sendcount, recvcount, remaining, rank, numProcs, status;
    double *row, *rowFinal, *start, factor;

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &numProcs);

    MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);

    row = (double *)malloc(n*sizeof(double));
    rowFinal = (double *)malloc(n*n*sizeof(double));

    for(i=0; i<n-1; i++)
    {
        if(rank == 0)
        {
            status = pivot(LU,i,n);

            for(j=0; j<n; j++)
            row[j] = LU[n*i+j];
        }

        MPI_Bcast(&status, 1, MPI_INT, 0, MPI_COMM_WORLD);

        if(status == -1)
            return -1;

        MPI_Bcast(row, n, MPI_DOUBLE, 0, MPI_COMM_WORLD);

    sendcount = (n-i-1)/numProcs;  
    recvcount = (n-i-1)/numProcs;
    remaining = (n-i-1)%numProcs;

    if(rank == 0)
        start = LU + n*(i+1);
    else
        start = NULL;

    MPI_Scatter(start, sendcount*n, MPI_DOUBLE, rowFinal, recvcount*n, MPI_DOUBLE, 0, MPI_COMM_WORLD);

    for(j=0; j<recvcount; j++)
    {
        factor = rowFinal[n*j+i]/row[i];

        for(k=i+1; k<n; k++)
            rowFinal[n*j+k] -= row[k]*factor;

        rowFinal[n*j+i] = factor;
    }

    MPI_Gather(rowFinal, recvcount*n, MPI_DOUBLE, start, sendcount*n, MPI_DOUBLE, 0, MPI_COMM_WORLD);

if(rank == 0)
    {
        int ctr = 0;

        while(ctr<remaining)
        {
            int index = sendcount*numProcs + ctr + i + 1;

            factor = LU[n*index+i]/row[i];

            for(k=i+1; k<n; k++)
                LU[n*index+k] -= row[k]*factor;

            LU[n*index+i] = factor;

            ctr++;
        }
    }
}   
free(row);
free(rowFinal);

return 0;
}

This code results in segmentation fault. I read a lot of answers and tried to fix it but that did not happen. I read about the problem of dereferencing a NULL pointer, which I fixed it by using a pointer called start. But the segmentation errors still keep showing up.

The error :

[sheshnag:32334] * Process received signal *

[sheshnag:32334] Signal: Segmentation fault (11)

[sheshnag:32334] Signal code: Address not mapped (1)

[sheshnag:32334] Failing at address: 0x44000098

[sheshnag:32334] [ 0] /lib/libpthread.so.0(+0xf8f0) [0x2b082eafe8f0]

[sheshnag:32334] [ 1] /usr/lib/openmpi/lib/libmpi.so.0(MPI_Comm_rank+0x5e) [0x2b082d5ff6ee]

[sheshnag:32334] [ 2] ./libluDecompose.so(luDecomposeP+0x2f) [0x2b082d17ea2f]

[sheshnag:32334] [ 3] _tmp/bench.mpi.exe(main+0x2e7) [0x40b61d]

[sheshnag:32334] [ 4] /lib/libc.so.6(__libc_start_main+0xfd) [0x2b082ed2ac4d]

[sheshnag:32334] [ 5] _tmp/bench.mpi.exe() [0x40ac49]

1
I was not able to reproduce a segmentation fault with your code...But i had to add a curly bracket } before free(row); to close for(i=0; i<n-1; i++){. Is there any chance that it could solve your problem ? - francis
Sorry that was a mistake in copying in the code. I have edited it to reflect the proper code. Also I have shown the error statements. Do they help in pointing out the segmentation fault? - p_kajaria

1 Answers

1
votes

From the stack trace that you have reported, it seems like the segmentation fault happens in the call to MPI_Comm_rank().

I see two possible problems:

  • MPI_Init() missing. Usually it's explicitly reported by MPI that it's missing, but it is possible that your MPI Implementation just lead to a crash? MPI_Init() must be invoked before any other MPI call (and MPI_Finalize() must be called before exiting).

  • broken MPI install. Does a simple MPI "hello world" program work correctly?

oh yes... third option:

  • the call happens with a compromised stack (from instructions preceding the call to luDecomposeP()): MPI_Comm_rank() is the first operation writing to a stack variable.