0
votes

I have a C program which makes use of the MPI library. I initialized a dynamic 2-D array which dimensions (rows and columns) are read from the stdin at process root.

When I try to distribute the elements (columns) cyclically among the other processes, I'm not making any progress. I use MPI_Scatter to distribute the columns to the other processes in an array. There I take advantage of the derived datatype MPI_Type_vector for the 2-D array.

Of course, it only distributes the first column to the processes' local 1-D arrays. So for the rest, I put MPI_Scatter in a for-loop, and now I have all columns distributed, but only for the case where the number of processes and the matrix dimensions are equal. How could I distribute more than one column to a process by using MPI_Scatter?

Up to this point, I doubt that this is the best attempt to solve the problem, because there must be a better way with less communication.

Is it wiser to use a 1-D array for a matrix instead of the 2-D array?

Edit:

After a little thinking, it's obvious that if I make use of the for-loop, the derived datatype MPI_Type_vector becomes unnecessary. That indicates the for-loop is not bringing me any further.

for(i=0 ;i<m; i++)
    MPI_Scatter(&(array[i][0]), 1,  ub_mpi_t, &local_array[i], 1, MPI_DOUBLE, 0,
                 MPI_COMM_WORLD) ;
2

2 Answers

1
votes

Ok, so let's try the simple case first -- that there's exactly one column per process. Below is my slightly edited version of what you have above; the differences I want to point out are just that we've changed how the array A is allocated, and we're just using the one vector data type:

#include <mpi.h>
#include <stdlib.h>

int main(int argc, char** argv)
{
  double **A = NULL ;   /*2D array initialised on process 0 */
  double *Adata = NULL;
  double *sendbufptr = NULL;
  int i,j ;
  double *column ; /*1D array for column */
  const int columnlen=6;
  int my_rank, p ;
  MPI_Datatype vector_mpi_t ;

  MPI_Init(&argc,&argv) ;

  MPI_Comm_rank(MPI_COMM_WORLD,&my_rank) ;
  MPI_Comm_size(MPI_COMM_WORLD,&p) ;

  /*initialise 2D array on process 0 and allocate memory*/
  if(my_rank==0)
  {
    A = (double**)malloc(p*sizeof(double *)) ;
    Adata = (double *)malloc(p*columnlen*sizeof(double));
    for(i=0;i<p;i++)
      A[i] = &(Adata[i*columnlen]);

    for (i=0; i<p; i++) 
        for (j=0; j<columnlen; j++)
            A[i][j] = i;

    /* print 2D array to screen */
    printf("Rank 0's 2D array:\n");
    for(i=0;i<p;i++)
    {
      for(j=0;j<columnlen;j++)
        printf( "%lf " , A[i][j]) ;
      printf( "\n") ;
    }
    printf( "\n") ;
    printf( "\n") ;
  }
  /* initialise and allocate memory for 1d column array on every process */
  column = (double*)malloc(columnlen*sizeof(double)) ;
  for(i=0;i<columnlen;i++)
  {
    column[i] = 0 ;
  }

  /*derived datatype for 2D array columns*/
  MPI_Type_vector(columnlen,1,1,MPI_DOUBLE,&vector_mpi_t) ;
  MPI_Type_commit(&vector_mpi_t);

  sendbufptr = NULL;
  if (my_rank == 0) sendbufptr=&(A[0][0]);
  MPI_Scatter(sendbufptr, 1, vector_mpi_t, column, 1, vector_mpi_t, 0, MPI_COMM_WORLD);
  /*print column on every process */

   printf("Rank %d's column: \n", my_rank);
   for(i=0;i<columnlen;i++)
   {
      printf( "%lf " , column[i]) ;
   }
   printf( "\n") ;


  MPI_Finalize() ;

  free(column);
  free(Adata);
  free(A);

  return 0;
}

The key here is that MPI_Scatter takes a pointer to a block of data - not pointers to pointers. So it won't dereference A[1] and then send what's pointing there, and then A[2] and what's pointing there, etc. It expects a contiguous block of data. So we've arranged that in how A's data is laid out in memory (note that this is usually the right way to do things anyway for numerical computation) - it has a column of data followed by the next column of data, etc. (Although the way I'm printing out the data it's more like rows, but whatever.)

Note too that in the MPI_Scatter call I can't just use &(A[0][0]), because that's dereferencing a null pointer in all but one of the processes.

Going from one column to several is pretty straightforward; the column data structure goes from being a 1d array to a 2d array laid out like A is.

#include <mpi.h>
#include <stdlib.h>

int main(int argc, char** argv)
{ 
  double **A = NULL ;   /*2D array initialised on process 0 */
  double *Adata = NULL;
  double *sendbufptr = NULL;
  int i,j ;
  double **columns ; /*2D array for column */
  double *columndata;
  const int columnlen=6;
  int ncolumns;
  int my_rank, p ;
  MPI_Datatype vector_mpi_t ;

  MPI_Init(&argc,&argv) ;

  MPI_Comm_rank(MPI_COMM_WORLD,&my_rank) ;
  MPI_Comm_size(MPI_COMM_WORLD,&p) ;

  ncolumns = 2*p;

  /*initialise 2D array on process 0 and allocate memory*/
  if(my_rank==0)
  {
    A = (double**)malloc(ncolumns*sizeof(double *)) ;
    Adata = (double *)malloc(ncolumns*columnlen*sizeof(double));
    for(i=0;i<ncolumns;i++)
      A[i] = &(Adata[i*columnlen]);

    for (i=0; i<ncolumns; i++) 
        for (j=0; j<columnlen; j++)
            A[i][j] = i;

    /* print 2D array to screen */
    printf("Rank 0's 2D array:\n");
    for(i=0;i<ncolumns;i++)
    {
      for(j=0;j<columnlen;j++)
        printf( "%lf " , A[i][j]) ;
      printf( "\n") ;
    }
    printf( "\n") ;
    printf( "\n") ;
  }
  /* initialise and allocate memory for 1d column array on every process */
  columndata = (double*)malloc((ncolumns/p)*columnlen*sizeof(double)) ;
  columns = (double **)malloc((ncolumns/p)*sizeof(double *));
  for(i=0;i<(ncolumns/p);i++)
  {
    columns[i] = &(columndata[i*columnlen]);
  }

  /*derived datatype for 2D array columns*/
  MPI_Type_vector(columnlen,1,1,MPI_DOUBLE,&vector_mpi_t) ;
  MPI_Type_commit(&vector_mpi_t);

  sendbufptr = NULL;
  if (my_rank == 0) sendbufptr=&(A[0][0]);
  MPI_Scatter(sendbufptr, (ncolumns/p), vector_mpi_t, &(columns[0][0]), (ncolumns/p), vector_mpi_t, 0, MPI_COMM_WORLD);

  /*print columns on every process */

   printf("Rank %d's columns: \n", my_rank);
   for(i=0;i<ncolumns/p;i++)
   {
     printf( "[%d]: ", my_rank) ;
     for(j=0;j<columnlen;j++)
     {
        printf( "%lf " , columns[i][j]) ;
     }
     printf( "\n") ;
  }

  MPI_Finalize() ;

  free(columns);
  free(Adata);
  free(A);

  return 0;
}

And then going to differing number of columns per processor requires using MPI_Scatterv rather than MPI_Scatter.

0
votes

I'm not sure that I fully understand what you are trying to do, nor how, so this may be off the mark:

It looks as if you are trying to distribute the columns of a 2D array across a number of processes. Perhaps you have a 10-column array and 4 processes, so 2 processes will get 3 columns each and 2 processes will get 2 columns each ? Of course, process 0 'gets' 3 columns from itself.

Your first step, which you've taken, is to define an MPI_Type_vector which defines a column of the array.

Next, and here I get a bit puzzled by your use of MPI_Scatter, why not simply write a loop which sends columns 1,2,3,4,5,6,7,8,9,10 to processes 0,1,2,3,0,1,2,3,0,1 (oh, having previously arranged that the receiving processes have a 2d array to put the columns into as they arrive) ?

The problem with using MPI_Scatter for this is that you have (I think) to send the same amount of data to each of the receiving processes, which, as you have observed, will not work if the number of processes does not exactly divide into the number of columns in your array. If you had to use MPI_Scatter then you might have to pad your array with extra columns, but that seems a bit pointless.

Finally, you might be able to do what you want in 1 statement with MPI_Type_create_subarray or MPI_Type_create_darray, but I have no experience of using these.