Matrix multiplication and global reduction operation in mpi

Question

For example, if I want to multiply 4 of 2 times 2 matrix together, for simplicity, let's say all of them are identical and have entry 1.

Then I am wondering how should I use global reduction in mpi to parallize this? Let's assume the size is 4.

Could you please give me an idea of doing this? Thanks!

# include <stdio.h>
# include <mpi.h>
# define N 4

//Create the 2 times 2 matrix type
typedef double Matrix[2][2];

void printMatrix(Matrix m);
void unitMatrix(Matrix m);
void randomMatrix(Matrix m);
void multMatrix(Matrix r, Matrix a, Matrix b);
void copyMatrix(Matrix out, Matrix in);
double random_number(void);
void my_range(int n, int *i1, int *i2);

int main(int argc, char *argv[])
{
    //Create a single matrix a
    Matrix a;
    Matrix buf;
    //Create a set of 100 matrix
    Matrix b[N];
    int i;
    int rank, i1, i2;
    double row1[2];
    double row2[2];
    double col1[2];
    double col2[2];


    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    my_range(N, &i1, &i2);

    for (i = 0; i < i1; i++) {
        randomMatrix(a);
    }
    for (i = i1; i <= i2; i++) {
    randomMatrix(b[i]);
    }
    for (i = i2 + 1; i < N; i++) {
        randomMatrix(a);
    }

    unitMatrix(a);

    for (i = i1; i <= i2; i++) {
    multMatrix(a, a, b[i]);
    MPI_Reduce(&a,&buf, 4, MPI_DOUBLE, MPI_PROD, 0,
             MPI_COMM_WORLD);
    }

    if (rank == 0) printMatrix(buf);

    MPI_Finalize();
    return 0;
}

//print a single matrix
void printMatrix(Matrix m) 
{
    printf("%26.18e %26.18e %26.18e %26.18e\n", 
       m[0][0], m[0][1], m[1][0], m[1][1]);
}

void unitMatrix(Matrix m)
{
    m[0][0] = 1.0;
    m[0][1] = 0.0;
    m[1][0] = 0.0;
    m[1][1] = 1.0;
}

void randomMatrix(Matrix m)
{
    m[0][0] = 1.0;
    m[0][1] = 1.0;
    m[1][0] = 1.0;
    m[1][1] = 1.0;
}

double random_number(void)
{
    const int mr = 714025;
    const int ia = 1366;
    const int ic = 150889;
    const double qdnorm = 1.0 / mr;
    static int irandom = 0;

    irandom = (ia * irandom + ic) % mr;
    return(irandom * qdnorm);
}

void multMatrix(Matrix r, Matrix a, Matrix b)
{
    // multMatrix(r, a, b) calculates  r = a * b
    // multMatrix(a, a, b) calculates  a = a * b
    // multMatrix(a, b, a) calculates  a = b * a

    Matrix tmp;

    tmp[0][0] = a[0][0] * b[0][0] + a[1][0] * b[0][1];
    tmp[0][1] = a[0][1] * b[0][0] + a[1][1] * b[0][1];
    tmp[1][0] = a[0][0] * b[1][0] + a[1][0] * b[1][1];
    tmp[1][1] = a[0][1] * b[1][0] + a[1][1] * b[1][1];

    copyMatrix(r, tmp);
}

void copyMatrix(Matrix out, Matrix in)
{
    out[0][0] = in[0][0];
    out[0][1] = in[0][1];
    out[1][0] = in[1][0];
    out[1][1] = in[1][1];
}

void my_range(int n, int *i1, int*i2)
{
    int size, rank, chunk, rest;

    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    chunk = n / size;
    rest = n % size;

    if (rank < rest) {
    chunk = chunk + 1;
    *i1 = chunk * rank;
    } else {
    *i1 = chunk * rank + rest;
    }

    *i2 = *i1 + chunk - 1;
}

There's lots of work out there on how to multiply two matrices together using MPI, can you tell us what you've tried and why it's not working and we can try to help you out? We don't really do homework problems for people here until they've shown that they've done their own work too. — Wesley Bland
I am not asking how to parallelize the matrix multiplication itself, but parallelization of chain of matrices. Assume matrix multiplies with each other normally. As you know, the matrix multiplication is associative but NOT commutative. So, if we have 100 matrices, then we can definitely make 4 threads processing 25 matrices multiplication at the same time, and then use global reduction to combine the local result. Now I am stuck at how to combine those local results. — Cancan

Hristo Iliev Hristo Iliev · Accepted Answer · 2014-01-31T11:58:56

Your code reduces the partial results using element-wise matrix multiplication, i.e. r[i][j] = a[i][j] * b[i][j], therefore giving you the wrong result. As already noted by haraldkl, you could use MPI's mechanism for user-defined MPI reduction operators MPI_Op_create. You should also create a user-defined MPI datatype in order to able to handle each array as a single matrix entity. For example:

void myMatrixProd(Matrix *in, Matrix *inout, int *len, MPI_Datatype *dptr)
{
   int i;

   for (i = 0; i < *len; i++)
   {
      multMatrix(inout[i], in[i], inout[i]);
   }
}

...

MPI_Op multOp;
MPI_Datatype matrixType;

MPI_Type_contiguous(2*2, MPI_DOUBLE, &matrixType);
MPI_Type_commit(&matrixType);

MPI_Op_create(myMatrixProd, 0, &multOp);

Matrix a, buf;

// Compute partial product into a
multMatrix(...);

// Reduce the partial products to get the total into rank 0
MPI_Reduce(&a, &buf, 1, matrixType, multOp, 0, MPI_COMM_WORLD);

An important thing to note is that the second argument to MPI_Op_create is 0. This is a flag that indicates whether the reduction operator is commutative. Matrix multiplication is not commutative (but still associative as required for all MPI reduction operators) and therefore one should specify 0 there.

Matrix multiplication and global reduction operation in mpi

2 Answers