2
votes

I am writing a program that multiplies two matrices A and B which are stored in text files, and which size could be variant, so my program has to identify the size of matrix A and B, determine if they can be multiplied etc.

Well that's not the problem the real trouble is when I pass the data from master process to slave process, in my program I pass rows from master to slaves and the number of rows depends on the number of the rows of the matrix and the number of processes.

The matrix A is stored by rows, but the matrixB is stored by columns.

matrixA[ 0 ] ----------------

matrixA[ 1 ] ----------------

matrixA[ 2 ] ----------------

matrixB[ 0 ] matrixB[ 1 ] matrixB[ 2 ] .........

|           |         |     |
|           |         |     |
|           |         |     |    

You can find the text files here (for input): matrixA matrixB.

After several days of 80's style debugging (means not debugger at all), I think the problem (the segmentation fault I get as output) is in these code-lines ( from the slave function):

void slave( int id, int slaves, double **matrixA, double **matrixB, double **matrixC )
{
    int type, columnsA, columnsB, rowsA, rowsB, Btype, offset, rows, averageRows, extraRows;
    MPI_Status status;

    /* Recieves columns of A and B from master. */
    type = 3;

    MPI_Recv( &columnsA, 1, MPI_INT, 0, type, MPI_COMM_WORLD, &status );
    MPI_Recv( &rowsA, 1, MPI_INT, 0, type, MPI_COMM_WORLD, &status );
    MPI_Recv( &columnsB, 1, MPI_INT, 0, type, MPI_COMM_WORLD, &status );
    MPI_Recv( &rowsB, 1, MPI_INT, 0, type, MPI_COMM_WORLD, &status );
    printf( "%d slave recieved ColumnA = %d, RowsA = %d, ColumnB = %d, RowsB = %d.\n", id, columnsA, rowsA, columnsB, rowsB );


    /* Recieve from master. */
    type = 0;

    MPI_Recv( &offset, 1, MPI_INT, 0, type, MPI_COMM_WORLD, &status );
    MPI_Recv( &rows, 1, MPI_INT, 0, type, MPI_COMM_WORLD, &status );

    matrixAllocate( &matrixA, columnsA, rows );
    matrixAllocate( &matrixB, rowsB, columnsB );
    matrixAllocate( &matrixC, columnsB, rows );
    printf( "Correctly allocated.\n" );

    /* This part is only to see if the mem was correctly allocated.*/
    for( int i = 0; i < rows; i++ ){
        for( int j = 0; j < columnsA; j++)
            matrixA[ i ][ j ] = i + j;
    }

    for( int i = 0; i < columnsB; i++ ){
        for( int j = 0; j < rowsB; j++)
            matrixB[ i ][ j ] = i * j;
    }

    if ( id == 1 ){
        matrixPrinter( "matrixA", matrixA, rows, columnsA );
        matrixBPrinter( "matrixB", matrixB, rowsB, columnsB );
        matrixPrinter( "matrixC", matrixC, rows, columnsB );
    }

    MPI_Recv( &matrixA, ( rows * columnsA ) , MPI_DOUBLE, 0, type, MPI_COMM_WORLD, &status );
    MPI_Recv( &matrixB, ( rowsB * columnsB ), MPI_DOUBLE, 0, type, MPI_COMM_WORLD, &status );
    printf( "Correctly recieved.\n" );

    matrixPrinter( "matrixA", matrixA, rows, columnsA );
    matrixBPrinter( "matrixB", matrixB, rowsB, columnsB );
    matrixPrinter( "matrixC", matrixC, rows, columnsB );

    if ( id == 1 ){
        printf( "My id is %d.\n", id );
        for ( int i = 0; i < rows; i++ ){
            for( int j = 0; j < columnsA; j++ ){
                printf( "%lf    ", matrixA[ i ][ j ] );
            }
        printf( "\n" );
    }
}

The whole code can be found here. MPI matrix multiplier in C.

The output of the terminal is:

enter image description here

2
It seems rather unfriendly to include the (input) text as images. What if someone actually wanted to look at what this does?sehe
I added the text files used for input.Alberto Bonsanto
When I encounter segfaults, I run the program using gdb. When the segfault occur, I type bt, which prints the stack trace, now it's very easy to spot the problem.stdcall
@Mellowcandle I tried doing that, but mpi is kinda hard to "just" gdb or valgrind :) Did you? Start with sudo -E mpirun -np 2 xterm -e "gdb test multiply A B C", or use Bullet 6 in these debugging FAQsehe

2 Answers

6
votes

the problem is, the matrix is of type "double **" as allocated in "matrixAllocate". When sending and receiving data, MPI assumes that the buf contains the data continiously as an 1-d array, which, however, is not the case.(you can easily check that by printing out the address of each matrix entry)

I think it's a famous pitfall in C: pointer and array are different. If the matrix is a 2-d array, then all the entries are laid out continuously.

My suggestion is to allocate the matrix as 1-d and do not use multidim subscript.

1
votes

Without digging through all of your MPI code, I hate to post an answer like this but I would suggest using the compiler command -Wall in the future. It may help and pick up an error like this. For MPI and anything computational related you almost always need the -Wall compiler command

look at the output and list of warnings from your code.

$ mpic++ test.cpp -Wall -o  test
test.cpp:30:63: warning: unused variable 'rank' [-Wunused-variable]
    int lineA, lineB, columnA, columnB, id, size, rc, slaves, rank, source;
                                                              ^
test.cpp:30:69: warning: unused variable 'source' [-Wunused-variable]
    int lineA, lineB, columnA, columnB, id, size, rc, slaves, rank, source;
                                                                    ^
test.cpp:126:50: warning: variable 'matrixC' is uninitialized when used here [-Wuninitialized]
            slave( id, slaves, matrixA, matrixB, matrixC );
                                                 ^~~~~~~
test.cpp:34:21: note: initialize the variable 'matrixC' to silence this warning
           **matrixC;
                    ^
                     = NULL
test.cpp:126:41: warning: variable 'matrixB' is uninitialized when used here [-Wuninitialized]
            slave( id, slaves, matrixA, matrixB, matrixC );
                                        ^~~~~~~
test.cpp:33:21: note: initialize the variable 'matrixB' to silence this warning
           **matrixB,
                    ^
                     = NULL
test.cpp:85:44: warning: variable 'rc' is uninitialized when used here [-Wuninitialized]
                MPI_Abort( MPI_COMM_WORLD, rc );
                                           ^~
test.cpp:30:53: note: initialize the variable 'rc' to silence this warning
    int lineA, lineB, columnA, columnB, id, size, rc, slaves, rank, source;
                                                    ^
                                                     = 0
test.cpp:126:32: warning: variable 'matrixA' is uninitialized when used here [-Wuninitialized]
            slave( id, slaves, matrixA, matrixB, matrixC );
                               ^~~~~~~
test.cpp:32:21: note: initialize the variable 'matrixA' to silence this warning
    double **matrixA,
                    ^
                     = NULL
test.cpp:398:20: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
    matrixPrinter( "matrixA", matrixA, rows, columnsA );
                   ^
test.cpp:399:21: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
    matrixBPrinter( "matrixB", matrixB, rowsB, columnsB );
                    ^
test.cpp:400:20: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
    matrixPrinter( "matrixC", matrixC, rows, columnsB );
                   ^
test.cpp:407:20: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
    matrixPrinter( "matrixA", matrixA, rows, columnsA );
                   ^
test.cpp:408:21: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
    matrixBPrinter( "matrixB", matrixB, rowsB, columnsB );
                    ^
test.cpp:409:20: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
    matrixPrinter( "matrixC", matrixC, rows, columnsB );
                   ^
test.cpp:363:70: warning: unused variable 'averageRows' [-Wunused-variable]
    int type, columnsA, columnsB, rowsA, rowsB, Btype, offset, rows, averageRows, extraRows;
                                                                     ^
test.cpp:363:83: warning: unused variable 'extraRows' [-Wunused-variable]
    int type, columnsA, columnsB, rowsA, rowsB, Btype, offset, rows, averageRows, extraRows;
                                                                                  ^
test.cpp:363:49: warning: unused variable 'Btype' [-Wunused-variable]
    int type, columnsA, columnsB, rowsA, rowsB, Btype, offset, rows, averageRows, extraRows;
                                                ^
15 warnings generated.