MPI Scatterv : How to deal with the root process?

Question

The thing I am still not too certain about is what happens with the root process in MPI Scatter / Scatterv.

If I divide an array as I try in my code, do I need to include the root process in the number of receivers (hence making the sendcounts of size nproc) or is it excluded?

In my example code for Matrix Multiplication, I still get an error by one of the processes running into aberrant behaviour, terminating the program prematurely:

void readMatrix();

double StartTime;
int rank, nproc, proc;
//double matrix_A[N_ROWS][N_COLS];
double **matrix_A;
//double matrix_B[N_ROWS][N_COLS];
double **matrix_B;
//double matrix_C[N_ROWS][N_COLS];
double **matrix_C;
int low_bound = 0; //low bound of the number of rows of each process
int upper_bound = 0; //upper bound of the number of rows of [A] of each process
int portion = 0; //portion of the number of rows of [A] of each process


int main (int argc, char *argv[]) {

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &nproc);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    matrix_A = (double **)malloc(N_ROWS * sizeof(double*));
    for(int i = 0; i < N_ROWS; i++) matrix_A[i] = (double *)malloc(N_COLS * sizeof(double));
    matrix_B = (double **)malloc(N_ROWS * sizeof(double*));
    for(int i = 0; i < N_ROWS; i++) matrix_B[i] = (double *)malloc(N_COLS * sizeof(double));
    matrix_C = (double **)malloc(N_ROWS * sizeof(double*));
    for(int i = 0; i < N_ROWS; i++) matrix_C[i] = (double *)malloc(N_COLS * sizeof(double));

    int *counts = new int[nproc](); // array to hold number of items to be sent to each process

    // -------------------> If we have more than one process, we can distribute the work through scatterv
    if (nproc > 1) {

        // -------------------> Process 0 initalizes matrices and scatters the portions of the [A] Matrix
        if (rank==0) {
            readMatrix();
        }
        StartTime = MPI_Wtime();
        int counter = 0;
        for (int proc = 0; proc < nproc; proc++) {
            counts[proc] = N_ROWS / nproc ;
            counter += N_ROWS / nproc ;
        }
        counter = N_ROWS - counter;
        counts[nproc-1] = counter;
        //set bounds for each process
        low_bound = rank*(N_ROWS/nproc);
        portion = counts[rank];
        upper_bound = low_bound + portion;
        printf("I am process %i and my lower bound is %i and my portion is %i and my upper bound is %i \n",rank,low_bound, portion,upper_bound);
        //scatter the work among the processes
        int *displs = new int[nproc]();
        displs[0] = 0;
        for (int proc = 1; proc < nproc; proc++) displs[proc] = displs[proc-1] + (N_ROWS/nproc);
        MPI_Scatterv(matrix_A, counts, displs, MPI_DOUBLE, &matrix_A[low_bound][0], portion, MPI_DOUBLE, 0, MPI_COMM_WORLD);
        //broadcast [B] to all the slaves
        MPI_Bcast(&matrix_B, N_ROWS*N_COLS, MPI_DOUBLE, 0, MPI_COMM_WORLD);


        // -------------------> Everybody does their work
        for (int i = low_bound; i < upper_bound; i++) {//iterate through a given set of rows of [A]
            for (int j = 0; j < N_COLS; j++) {//iterate through columns of [B]
                for (int k = 0; k < N_ROWS; k++) {//iterate through rows of [B]
                    matrix_C[i][j] += (matrix_A[i][k] * matrix_B[k][j]);
                }
            }
        }

        // -------------------> Process 0 gathers the work
        MPI_Gatherv(&matrix_C[low_bound][0],portion,MPI_DOUBLE,matrix_C,counts,displs,MPI_DOUBLE,0,MPI_COMM_WORLD);
    }
...

Your matrix_A is a double** so it doesn't fit the profile for the first parameter of MPI_Scatterv(). matrix_A[0] could have, but since you used a loop of malloc() to allocate the memory, it isn't contiguously stored and therefore cannot be used this way. — Gilles

Jorge Bellon Jorge Bellon · Accepted Answer · 2016-08-25T12:03:23

The root process also takes place in the receiver side. If you are not interested in that, just set sendcounts[root] = 0.

See MPI_Scatterv for specific information on which values you have to pass exactly.

However, take care of what you are doing. I strongly suggest that you change the way you allocate your matrix as a one-dimensional array, using a single malloc like this:

double* matrix = (double*) malloc( N_ROWS * N_COLS * sizeof(double) );

If you still want to use a two-dimensional array, then you may need to define your types as a MPI derived datatype.

The datatype you are passing is not valid if you want to send more than a row in a single MPI transfer.

With MPI_DOUBLE you are telling MPI that the buffer contains a contiguous array of count MPI_DOUBLE values.

Since you are allocating a two-dimensional array using multiple malloc calls, then your data is not contiguous.

MPI Scatterv : How to deal with the root process?

1 Answers