1
votes

I have programmed a matrix-matrix multiplication successfully on a single node, and now my aim is to link that program to execute in parallel on clusters nodes.

The main work modifies the code from source code of Scalapack Netlib with change the original code ( of ScaLAPACK) with part calculate matrix-matrix multiplication (in this case dgemm_) by my program (mydgemm).

In here, the original code is C program, but all routine in that program call Fortran routine (like dgemm_ is Fortran language), and my program (mydgemm) is C program.

After I modify, I can execute successful with a single node with any size of the matrix, but when I run with 4 nodes (with the size of matrix larger than 200) -> It has an error about communication data between node (MPI).

This is an error:

*BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES

PID 69754 RUNNING AT localhost.localdomain

EXIT CODE: 11

CLEANING UP REMAINING PROCESSES

YOU CAN IGNORE THE BELOW CLEANUP MESSAGES* 

I just use MPI in the main function to create matrix random at each node ( attaching following) - with routine is called new_pdgemm (...). (I modified code inside new-pdgemm).

Inside mydgemm.c, I use OMP to parallel and this code executed on the kernel.

  • Could give me a guide or idea to solve my problem?

  • Do you think the problem because Fortran is column major, but C is row major?

  • Or do I need to change mydgemm.c by mydgemm.f ( it's really hard and maybe I can't do it)?

My code:

int main(int argc, char **argv) {
   int i, j, k;
/************  MPI ***************************/
   int myrank_mpi, nprocs_mpi;
   MPI_Init( &argc, &argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &myrank_mpi);
   MPI_Comm_size(MPI_COMM_WORLD, &nprocs_mpi);
/************  BLACS ***************************/
   int ictxt, nprow, npcol, myrow, mycol,nb;
   int info,itemp;
   int _ZERO=0,_ONE=1;
     int M=20000;
     int K=20000;
     int N=20000;
   nprow = 2; npcol = 2; 
     nb=1200;

   Cblacs_pinfo( &myrank_mpi, &nprocs_mpi ) ;
   Cblacs_get( -1, 0, &ictxt );
   Cblacs_gridinit( &ictxt, "Row", nprow, npcol );
   Cblacs_gridinfo( ictxt, &nprow, &npcol, &myrow, &mycol );
   //printf("myrank = %d\n",myrank_mpi);


   int rA = numroc_( &M, &nb, &myrow, &_ZERO, &nprow );
   int cA = numroc_( &K, &nb, &mycol, &_ZERO, &npcol );
   int rB = numroc_( &K, &nb, &myrow, &_ZERO, &nprow );
   int cB = numroc_( &N, &nb, &mycol, &_ZERO, &npcol );
   int rC = numroc_( &M, &nb, &myrow, &_ZERO, &nprow );
   int cC = numroc_( &N, &nb, &mycol, &_ZERO, &npcol );

   double *A = (double*) malloc(rA*cA*sizeof(double));
   double *B = (double*) malloc(rB*cB*sizeof(double));
   double *C = (double*) malloc(rC*cC*sizeof(double));

   int descA[9],descB[9],descC[9];

     descinit_(descA, &M,   &K,   &nb,  &nb,  &_ZERO, &_ZERO, &ictxt, &rA,  &info);
     descinit_(descB, &K,   &N,   &nb,  &nb,  &_ZERO, &_ZERO, &ictxt, &rB,  &info);
     descinit_(descC, &M,   &N,   &nb,  &nb,  &_ZERO, &_ZERO, &ictxt, &rC,  &info);

   double alpha = 1.0; double beta = 1.0;   
    double start, end, flops;
     srand(time(NULL)*myrow+mycol);
     #pragma simd
     for (j=0; j<rA*cA; j++)
     {
         A[j]=((double)rand()-(double)(RAND_MAX)*0.5)/(double)(RAND_MAX);
    //   printf("A in myrank: %d\n",myrank_mpi);
     }
//   printf("A: %d\n",myrank_mpi);
     #pragma simd
     for (j=0; j<rB*cB; j++)
     {
         B[j]=((double)rand()-(double)(RAND_MAX)*0.5)/(double)(RAND_MAX);
     }
     #pragma simd
     for (j=0; j<rC*cC; j++)
     {
         C[j]=((double)rand()-(double)(RAND_MAX)*0.5)/(double)(RAND_MAX);
     }
     MPI_Barrier(MPI_COMM_WORLD);

  start=MPI_Wtime();

    new_pdgemm ("N", "N", &M , &N , &K , &alpha, A , &_ONE, &_ONE , descA , B , &_ONE, &_ONE , descB , &beta , C , &_ONE, &_ONE , descC );
MPI_Barrier(MPI_COMM_WORLD);
     end=MPI_Wtime();

     if (myrow==0 && mycol==0)
     {
        flops = 2 * (double) M * (double) N * (double) K / (end-start) / 1e9;
    /*   printf("This is value: %d\t%d\t%d\t%d\t%d\t%d\t\n",rA,cA,rB,cB,rC,cC);
        printf("%f\t%f\t%f\n", A[4], B[6], C[3]);*/
         printf("%f Gflops\n", flops);
     }
   Cblacs_gridexit( 0 );
   MPI_Finalize();
   free(A);
   free(B);
   free(C);
   return 0;
}
1
Welcome. Work in small steps. Do not use OpenMP until your basic MPI works perfectly. Try to test your code as often as possible once you added a new small functionality. Use a debugger or print statements or address sanitizations to find out on which line of code does the crash happen. - Vladimir F Героям слава
If you need help debugging your code, you should post a minimal reproducible example - Gilles Gouaillardet
Thanks @ Gilles Gouaillardet. - Nguyen Thi My Tuyen
Thanks @Vladimir F, I used degugger or print statements but it's also show me error the same. I dont know how to do with you said about "address sanitizations to find out on which line of code does the crash happen." - Nguyen Thi My Tuyen
Forget the sanitizations. Use the debugger or print statements to find out where it crashes. You can also use code bisection, but it may require some more experience en.wikipedia.org/wiki/Bisection_(software_engineering) - Vladimir F Героям слава

1 Answers

-1
votes

OK, this is not really an answer but it's too long for a comment, and I want the formatting that an answer gives you anyway.

So I fixed the bug with blacs_gridexit I noted in the comments, namely making the parameter ictxt as required by the routine description. I then replaced your routine with the standard pdgemm. Once I made these changes and cut the matrix size to 2,000*2,000 to fit on my laptop. The code then runs successfully, at least in the sense it report no error and gives a sort of sensible GFlopage. So that suggests to me that either

  • There is a bug in the code you can not show us
  • There is a problem with you MPI, blacs, pblas and/or Scalapack implementation

I would thus reinstall the libraries you are using, making sure that they are consistent with the compiler you are using, and run the tests supplied with the libraries, and also included the header files which you have omitted in your code ( DON'T, these are very important!). If these work I would suggest it is due to a bug in your code. What is the reason you can't show this?

Anyway below is the code I successfully run. If I was doing this properly in my own code I would definitely also fix all those compiler warnings by making sure appropriate prototypes are in scope when the function is invoked.

ian-admin@agon ~/work/stack/mpi $ cat stack.c
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include "mpi.h"

int main(void) {
   int i, j, k;
/************  MPI ***************************/
   int myrank_mpi, nprocs_mpi;
   MPI_Init( NULL, NULL);
   MPI_Comm_rank(MPI_COMM_WORLD, &myrank_mpi);
   MPI_Comm_size(MPI_COMM_WORLD, &nprocs_mpi);
/************  BLACS ***************************/
   int ictxt, nprow, npcol, myrow, mycol,nb;
   int info,itemp;
   int _ZERO=0,_ONE=1;
     int M=2000;
     int K=2000;
     int N=2000;
   nprow = 2; npcol = 2; 
     nb=1200;


   Cblacs_pinfo( &myrank_mpi, &nprocs_mpi ) ;
   Cblacs_get( -1, 0, &ictxt );
   Cblacs_gridinit( &ictxt, "Row", nprow, npcol );
   Cblacs_gridinfo( ictxt, &nprow, &npcol, &myrow, &mycol );
   //printf("myrank = %d\n",myrank_mpi);


   int rA = numroc_( &M, &nb, &myrow, &_ZERO, &nprow );
   int cA = numroc_( &K, &nb, &mycol, &_ZERO, &npcol );
   int rB = numroc_( &K, &nb, &myrow, &_ZERO, &nprow );
   int cB = numroc_( &N, &nb, &mycol, &_ZERO, &npcol );
   int rC = numroc_( &M, &nb, &myrow, &_ZERO, &nprow );
   int cC = numroc_( &N, &nb, &mycol, &_ZERO, &npcol );

   double *A = (double*) malloc(rA*cA*sizeof(double));
   double *B = (double*) malloc(rB*cB*sizeof(double));
   double *C = (double*) malloc(rC*cC*sizeof(double));

   int descA[9],descB[9],descC[9];

     descinit_(descA, &M,   &K,   &nb,  &nb,  &_ZERO, &_ZERO, &ictxt, &rA,  &info);
     descinit_(descB, &K,   &N,   &nb,  &nb,  &_ZERO, &_ZERO, &ictxt, &rB,  &info);
     descinit_(descC, &M,   &N,   &nb,  &nb,  &_ZERO, &_ZERO, &ictxt, &rC,  &info);

   double alpha = 1.0; double beta = 1.0;   
    double start, end, flops;
     srand(time(NULL)*myrow+mycol);
     #pragma simd
     for (j=0; j<rA*cA; j++)
     {
         A[j]=((double)rand()-(double)(RAND_MAX)*0.5)/(double)(RAND_MAX);
    //   printf("A in myrank: %d\n",myrank_mpi);
     }
//   printf("A: %d\n",myrank_mpi);
     #pragma simd
     for (j=0; j<rB*cB; j++)
     {
         B[j]=((double)rand()-(double)(RAND_MAX)*0.5)/(double)(RAND_MAX);
     }
     #pragma simd
     for (j=0; j<rC*cC; j++)
     {
         C[j]=((double)rand()-(double)(RAND_MAX)*0.5)/(double)(RAND_MAX);
     }
     MPI_Barrier(MPI_COMM_WORLD);

  start=MPI_Wtime();

    pdgemm_ ("N", "N", &M , &N , &K , &alpha, A , &_ONE, &_ONE , descA , B , &_ONE, &_ONE , descB , &beta , C , &_ONE, &_ONE , descC );
MPI_Barrier(MPI_COMM_WORLD);
     end=MPI_Wtime();

     if (myrow==0 && mycol==0)
     {
        flops = 2 * (double) M * (double) N * (double) K / (end-start) / 1e9;
    /*   printf("This is value: %d\t%d\t%d\t%d\t%d\t%d\t\n",rA,cA,rB,cB,rC,cC);
        printf("%f\t%f\t%f\n", A[4], B[6], C[3]);*/
         printf("%f Gflops\n", flops);
     }
   Cblacs_gridexit( ictxt );
   MPI_Finalize();
   free(A);
   free(B);
   free(C);
   return 0;
}
ian-admin@agon ~/work/stack/mpi $ mpicc -g stack.c /home/ian-admin/Downloads/scalapack-2.0.2/libscalapack.a -llapack -lblas -lgfortran
stack.c: In function ‘main’:
stack.c:24:4: warning: implicit declaration of function ‘Cblacs_pinfo’ [-Wimplicit-function-declaration]
    Cblacs_pinfo( &myrank_mpi, &nprocs_mpi ) ;
    ^~~~~~~~~~~~
stack.c:25:4: warning: implicit declaration of function ‘Cblacs_get’ [-Wimplicit-function-declaration]
    Cblacs_get( -1, 0, &ictxt );
    ^~~~~~~~~~
stack.c:26:4: warning: implicit declaration of function ‘Cblacs_gridinit’ [-Wimplicit-function-declaration]
    Cblacs_gridinit( &ictxt, "Row", nprow, npcol );
    ^~~~~~~~~~~~~~~
stack.c:27:4: warning: implicit declaration of function ‘Cblacs_gridinfo’ [-Wimplicit-function-declaration]
    Cblacs_gridinfo( ictxt, &nprow, &npcol, &myrow, &mycol );
    ^~~~~~~~~~~~~~~
stack.c:31:13: warning: implicit declaration of function ‘numroc_’ [-Wimplicit-function-declaration]
    int rA = numroc_( &M, &nb, &myrow, &_ZERO, &nprow );
             ^~~~~~~
stack.c:44:6: warning: implicit declaration of function ‘descinit_’ [-Wimplicit-function-declaration]
      descinit_(descA, &M,   &K,   &nb,  &nb,  &_ZERO, &_ZERO, &ictxt, &rA,  &info);
      ^~~~~~~~~
stack.c:72:5: warning: implicit declaration of function ‘pdgemm_’ [-Wimplicit-function-declaration]
     pdgemm_ ("N", "N", &M , &N , &K , &alpha, A , &_ONE, &_ONE , descA , B , &_ONE, &_ONE , descB , &beta , C , &_ONE, &_ONE , descC );
     ^~~~~~~
stack.c:83:4: warning: implicit declaration of function ‘Cblacs_gridexit’ [-Wimplicit-function-declaration]
    Cblacs_gridexit( ictxt );
    ^~~~~~~~~~~~~~~
/usr/bin/ld: warning: libgfortran.so.3, needed by //usr/lib/liblapack.so, may conflict with libgfortran.so.5
ian-admin@agon ~/work/stack/mpi $ mpirun -np 4 --oversubscribe ./a.out 
9.424291 Gflops