MPI How to make variables initialized in root process visible in other processes

votes

I've got couple of test methods that initialize array a0 but the array differs in size with each method. These arrays are initialized in process rank==0. Now in other file I have an implementation for a method that makes some computations using values from that table (it doesn't modify them). It looks somewhat like this (pseudo code):

res=0;
x=0;
for(i=mystart;i<myend;++i)
  for(j=0;j<length;++j)
    x+= a[j] * multiplier;

MPI_Reduce(&x,&res,1,MPI_DOUBLE,MPI_SUM,0,MPI_COMM_WORLD);

I have a requirement that those computations can only be processed by processes other than root, but since they don't see what values does a[] hold, it doesn't work. I tried MPI_Bcast(a, ?, MPI_DOUBLE, 0, MPI_COMM_WORLD), but I can't get value of the size, since it differes with every method call. The whole code is kinda big but if someone wants to take a look I can publish it somewhere...

Is there an easy way to make those fields visible to the other processes?

EDIT: As suggested I uploaded the code: main func: http://pastebin.com/tREeHMZM methods for comutation and class declaration: http://pastebin.com/BBedJ7bA

Dividing the loops works fine it's just those comptations that don't work, and I'm not sure how to make those processes communicate with each other.

c++parallel-processingmpi

You probably do need to post some more of the relevant fragments. This helps keeps the question concise and makes it more of an MCVE. I'd post the MPI_Bcast and the "client side" code that receives them. And enough code regarding the generation and usage of the size. To prevent "going round-n-round" on posting more here [based on feedback], add a link to [e.g. pastebin] for full code. When you get enough feedback to solve the problem, you can edit the fragments to create a whole question here [and optionally remove the external link]. You can post all here. Limit is 30,000 chars – Craig Estey

As I couldnt paste another link, here's my attempt at making Calc method parallel: pastebin.com/t53fH3fh – adammayer

@Craig Estey I've uploaded the code as you suggested – adammayer

I downloaded all the code, but, AFAICT, Function.h is missing. I'll try to analyze things without it, but having it would help. – Craig Estey

Broadcast the size first, then broadcast the array. – NoseKnowsAll

2 Answers

votes

Why not broadcast the size of the array you're going to broadcast first?

votes

You were really quite close. I believe I was able to correct the program and the corrected text is below.

Note: I started writing this explanation before I actually modified the program, so replace "you must do" with "I did"

I was able to download and build your program. I was able to work around the missing Function.h, based on what I already had by changing Function::value into just Function_value [a global function].

The one thing you do have to do is have rank 0 get the random seed value. It has to bcast this first, so the client ranks can get it so their calls to shuffle will produce the same results. So, I created a global ranseed, set from a bcast in main, that shuffle can see it.

I changed the test* routines to always compute x0, y0, a, and c, regardless of rank. Otherwise, the compiler was complaining about possibly uninitialized values for those pointers. The computation is small. No need to broadcast these values. If this precalculation were intensive [which it isn't], then bcast-ing the arrays would be the way to go.

I think you already realized this in the MPI version of calcError because you were commenting out the bcasts for these arrays.

AFAICT, your MPI calcError looks good except for two things:

(1) The MPI_Reduce call should not be inside the if (myid_c >= 0) block. I think it needs to be at the bottom after this block (i.e. even rank 0 needs to call it).

(2) In the old place for the MPI_Reduce [inside the if], I think you need an error_p = error; after the double level for loops because error_p is the value that the clients send [and error is what the root receives]

I added a -R command line option to allow manual setting the of the random seed value. I also added a -M option to switch between the MPI and non-MPI versions of calcError

I've run the program using the mpi mode: mpirun -np 8 <program> -M and all tests seemed to work. I did ctrl-c on testL after a while, so no guarantees on that.

Here's a single [archive] file, similar to your paste, the file separater line is % <filename>

It has all my outlined corrections [please pardon the gratuitous style cleanup].

Warning: Be careful when extracting Function.h as I had to create a skeleton version, so if your version has a class definition, it will get lost.

% Error.h
// mpisize/Error.h -- Error class

#ifndef _mpisize_Error_h_
#define _mpisize_Error_h_

class Error {
  private:
    double *v;
    double *x0;
    double *y0;
    double *a;
    double *c;
    int length;                         // grid length
    double length_2;
    double mult;
    int size;                           // number of gausses to add
    double error;

  public:
    void setValues(double *v, int length, double mult);
    void setCoefficients(double *x0,double *y0, double *a,double *c, int size);
    void calcError(int mpiflg);
    void calcErrorStd();
    void calcErrorMpi();
    double getError();
};

#endif
% Function.h
// mpisize/Function.h -- Function class

#ifndef _mpisize_Function_h_
#define _mpisize_Function_h_

#include <math.h>

//Function code
double
Function_value(double x, double y, double x0, double y0, double a, double c);

#endif
% mpisize.h
// mpisize/mpisize.h -- Function class

#ifndef _mpisize_mpisize_h_
#define _mpisize_mpisize_h_

#include <math.h>
#include "Error.h"
#include "Function.h"

#ifdef _MPISIZE_GLO_
#define EXTRN_MPISIZE       /**/
#else
#define EXTRN_MPISIZE       extern
#endif

EXTRN_MPISIZE int opt_debug;
EXTRN_MPISIZE int opt_send;
EXTRN_MPISIZE int glob_rank;
EXTRN_MPISIZE double tvzero;

double
tvgetf(void);

int
dbgif(int lvl);

void
_dbgprt(const char *fmt,...);

#define dbgprt(_lvl,_fmt...) \
    do { \
        if (dbgif(_lvl)) \
            _dbgprt(_fmt); \
    } while (0)

#endif
% ErrMpi.cpp
// mpisize/ErrMpi -- MPI threaded version of calcError

#include <iostream>
#include <math.h>
#include <mpi.h>
#include "mpisize.h"

void
Error::calcErrorMpi()
{
    int mystart, myend;
    int myid, myid_c;
    int numproc, numproc_c;

    //MPI_Comm_rank(MPI_COMM_WORLD, &myid);
    myid = glob_rank;
    MPI_Comm_size(MPI_COMM_WORLD, &numproc);

    myid_c = myid - 1;
    numproc_c = numproc - 1;

    // NOTE/BUG: length_2 is double, so it needs MPI_DOUBLE
#if 0
    MPI_Bcast(&length, 1, MPI_INT, 0, MPI_COMM_WORLD);
    // MPI_Bcast(&length_2, 1, MPI_INT, 0, MPI_COMM_WORLD);  // broken
    // MPI_Bcast(&length_2, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);  // fixed
#endif

    MPI_Bcast(&error, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
    // MPI_Bcast(v, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);

    MPI_Bcast(&size, 1, MPI_INT, 0, MPI_COMM_WORLD);
    MPI_Bcast(&mult, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);

    // NOTE/BUG: these are arrays, so second argument must be array count
    if (opt_send) {
        MPI_Bcast(x0, size, MPI_DOUBLE, 0, MPI_COMM_WORLD);
        MPI_Bcast(y0, size, MPI_DOUBLE, 0, MPI_COMM_WORLD);
        MPI_Bcast(a, size, MPI_DOUBLE, 0, MPI_COMM_WORLD);
        MPI_Bcast(c, size, MPI_DOUBLE, 0, MPI_COMM_WORLD);
    }

    double error_p = 0;

    error = 0;

    if (myid_c >= 0) {
        mystart = (length / numproc_c) * myid_c;

        if (length % numproc_c > myid_c) {
            mystart += myid_c;
            myend = mystart + (length / numproc_c) + 1;
        }
        else {
            mystart += length % numproc_c;
            myend = mystart + (length / numproc_c);
        }

        dbgprt(2,"calcErrorMpi: STARTUP myid_c=%d numproc_c=%d length=%d length_2=%g mystart=%d myend=%d\n",
            myid_c,numproc_c,length,length_2,mystart,myend);

        double dv;
        double vtmp;

        for (int i = mystart; i < myend; i++) {
            for (int j = 0; j < length; j++) {
                vtmp = 0.0;
                for (int k = 0; k < size; k++)
                    vtmp += Function_value((i - length_2) * mult,
                        (j - length_2) * mult,
                        x0[k], y0[k], a[k], c[k]);
                dv = v[i * length + j] - vtmp;
                error += dv * dv;
            }
        }

        error_p = error;

        // if(myid == 0 ) cout << "Proces " << myid << " after reducing error = " << error << endl;

    }

    dbgprt(2,"calcErrorMpi: RED/BEF error_p=%g error=%g\n",error_p,error);
    MPI_Reduce(&error_p, &error, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
    dbgprt(2,"calcErrorMpi: RED/AFT error_p=%g error=%g\n",error_p,error);
}
% Error.cpp
// mpisize/Error -- Error routines

#include <iostream>
#include <math.h>
#include "Error.h"
#include "Function.h"

using namespace std;

void
Error::setValues(double *v, int length, double mult)
{
    this->v = v;
    this->length = length;
    this->mult = mult;
    length_2 = length * 0.5;
    cout << "test" << endl;
}

void
Error::setCoefficients(double *x0, double *y0, double *a, double *c, int size)
{
    this->x0 = x0;
    this->y0 = y0;
    this->a = a;
    this->c = c;
    this->size = size;
}

void
Error::calcErrorStd()
{
    double dv;
    double vtmp;

    error = 0;

    for (int i = 0; i < length; i++) {
        for (int j = 0; j < length; j++) {
            vtmp = 0.0;
            for (int k = 0; k < size; k++)
                vtmp += Function_value((i - length_2) * mult,
                    (j - length_2) * mult,
                    x0[k], y0[k], a[k], c[k]);
            dv = v[i * length + j] - vtmp;
            error += dv * dv;
        }
    }
}

void
Error::calcError(int mpiflg)
{

    if (mpiflg)
        calcErrorMpi();
    else
        calcErrorStd();
}

double
Error::getError()
{
    return sqrt(error);
}
% Function.cpp
// mpisize/Function -- function code

#include "Function.h"

// Function code
double
Function_value(double x, double y, double x0, double y0, double a, double c )
{
   return a * exp(-((x - x0) * (x - x0) + (y - y0) * (y - y0)) * c);
}
% mpisize.cpp
// mpisize/mpisize -- main program and test routines

#define _MPISIZE_GLO_
#include "mpisize.h"
#include "Error.h"
#include "Function.h"

#include <iostream>
#include <mpi.h>
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <sys/time.h>
#include <math.h>

using namespace std;

const int LEN = 1250;
const int LEN_2 = LEN / 2;
const int SIZE = LEN * LEN;
const int SHUFFLE_TIMES = 2 * SIZE;
const double MULT = 0.1;
const int COMPLEXITY = 8;

int ranseed;
int opt_mpi;

double
generate(double x, double y, double *x0, double *y0, double *a, double *c, int indexS, int indexE)
{
    double v = 0.0;

    for (int i = indexS; i < indexE; i++) {
        v += Function_value(x, y, x0[i], y0[i], a[i], c[i]);
    }
    return v;
}

void
generate(double *v, double *x0, double *y0, double *a, double *c)
{
    for (int i = 0; i < LEN; i++)
        for (int j = 0; j < LEN; j++)
            v[i * LEN + j] = generate((i - LEN_2) * MULT, (j - LEN_2) * MULT,
                x0, y0, a, c, 0, COMPLEXITY);
}

void
shuffle(double *v)
{
    // losowo przestawiamy polozenia czastek
    int i;
    int j;
    double vtmp;

    srandom(ranseed);

    for (int k = 0; k < SHUFFLE_TIMES; k++) {
        i = random() % SIZE;
        j = random() % SIZE;

        vtmp = v[i];

        v[i] = v[j];
        v[i] = vtmp;
    }
}

void
test(const char *testname,Error *err, double errorExpected)
{

    err->calcError(opt_mpi);

    double error;

    if (glob_rank == 0) {
        error = err->getError();
        cout << endl << "Test " << testname << " blad " << error << endl;

        if (fabs(error - errorExpected) > (0.001 + 0.001 * errorExpected)) {
            cerr << "Blad powinno byc " << errorExpected << endl;
            MPI_Finalize();
            exit(0);
        }
        else {
            cout << " - - - - - - OK" << endl;
        }
    }
}

void
test1(Error *err)
{

    double x0[] = { -3, -2, -2, -1, 1, 2, 2, 3 };
    double y0[] = { -3, -3, 3, -1, 1, -3, 3, 3 };
    double a[] = { 1, 2, 3, 4, 5, 6, 7, 8 };
    double c[] = { 0.1, 0.05, 0.02, 0.01, 0.01, 0.01, 0.02, 0.05 };

    err->setCoefficients(x0, y0, a, c, 8);

    test("test1", err, 0);
}

void
test2(Error *err)
{

    double x0[] = { -3, -1, -2, -1, 1, 2, 2, 3, 1, 2, 3, 4, 5, 6, 7 };
    double y0[] = { -3, -3, 3, -1, 1, -3, 1, 3, 1, 2, 3, 4, 1, 2, 3 };
    double a[] = { 1, 2, 3, 4, 5, 6, 7, 8, 7, 6, 5, 4, 1, 2, 3 };
    double c[] = { 0.1, 0.05, 0.02, 0.01, 0.02, 0.01, 0.02, 0.05, 1, 1, 1, 1, 1, 2, 3 };

    err->setCoefficients(x0, y0, a, c, 15);

    test("test2", err, 357.729);
}

void
test3(Error *err)
{

    double x0[] = { -3, -1, -2, -1, 1, 2, 2, 3, -3, -1, -2, -1, 1, 2, 2, 3, 1, 2, 3, 4, 5, 6, 7, -3, -1, -2, -1, 1, 2, 2, 3, 1, 2, 3, 4, 5, 6, 7 };
    double y0[] = { -3, -3, 3, -1, 1, -3, 1, 3, -3, -1, -2, -1, 1, 2, 2, 3, 1, 2, 3, 4, 5, 6, 7, -3, -1, -2, -1, 1, 2, 2, 3, 1, 2, 3, 4, 5, 6, 7 };
    double *a = new double[38];
    double *c = new double[38];

    for (int i = 0; i < 38; i++) {
        a[i] = 1 + i / 38.0;
        c[i] = 2 + i / 38.0;
    }

    err->setCoefficients(x0, y0, a, c, 38);

    test("test3", err, 2975.86);
}

void
test4(Error *err)
{

    double *x0 = new double[150];
    double *y0 = new double[150];
    double *a = new double[150];
    double *c = new double[150];

    for (int i = 0; i < 150; i++) {
        x0[i] = 5 - i * 0.2;
        a[i] = 1 + i / 8.0;
        y0[i] = 2 - i * 0.22;
        c[i] = 2 + i / 38.0;
    }

    err->setCoefficients(x0, y0, a, c, 150);

    test("test4", err, 3303.04);
}

void
testL(Error *err)
{
    double *x0;
    double *y0;
    double *a;
    double *c;

    if (glob_rank == 0)
        cout << "Test pozwalajacy na oszacowanie przyspieszenia" << endl;

    x0 = new double[111];
    y0 = new double[111];
    a = new double[111];
    c = new double[111];

    for (int i = 0; i < 111; i++) {
        x0[i] = 5 - i * 0.2;
        a[i] = 1 + i / 1.1;
        y0[i] = 2 - i * 0.4;
        c[i] = 2 + i / 38.0;
    }

    double toterror = 0;
    double error;

    for (int i = 0; i < 20; i++) {
        a[i] = i;
        err->setCoefficients(x0, y0, a, c, 111 - i * 3);

        err->calcError(opt_mpi);

        error = err->getError();
        dbgprt(2,"testL: POST error=%g\n",error);

        if (glob_rank == 0) {
            toterror += error;
        }
    }

    if (glob_rank == 0)
        cout << "Uwaga: ta wartosc nie moze zalezec od liczby uzytych procesow = " << toterror << endl;
}

int
main(int ac, char **av)
{
    char *cp;

    MPI_Init(&ac, &av);

    MPI_Comm_rank(MPI_COMM_WORLD, &glob_rank);

#if 1
    ranseed = 123767832;
#else
    ranseed = 0;
#endif

    --ac;
    ++av;

    for (;  ac > 0;  --ac, ++av) {
        if (glob_rank != 0)
            break;

        cp = *av;
        if (*cp != '-')
            break;

        switch (cp[1]) {
        case 'd':
            opt_debug = 1;
            break;

        case 'M':
            opt_mpi = 1;
            break;

        case 'R':
            ranseed = atoi(cp + 2);
            break;

        case 'S':
            opt_send = 1;
            break;
        }
    }

    tvzero = tvgetf();

    if (ranseed == 0)
        ranseed = time(NULL);
    if (glob_rank == 0) {
        cout << "Random: " << ranseed << std::endl;
        cout << "Mpi: " << opt_mpi << std::endl;
        cout << "Send: " << opt_send << std::endl;
        cout << "Debug: " << opt_debug << std::endl;
    }

    MPI_Bcast(&tvzero, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
    MPI_Bcast(&ranseed, 1, MPI_INT, 0, MPI_COMM_WORLD);
    MPI_Bcast(&opt_mpi, 1, MPI_INT, 0, MPI_COMM_WORLD);
    MPI_Bcast(&opt_send, 1, MPI_INT, 0, MPI_COMM_WORLD);
    MPI_Bcast(&opt_debug, 1, MPI_INT, 0, MPI_COMM_WORLD);

    Error *err = new Error();
    double *v;

    v = new double[SIZE];
    double x0[] = { -3, -2, -2, -1, 1, 2, 2, 3 };
    double y0[] = { -3, -3, 3, -1, 1, -3, 3, 3 };
    double a[] = { 1, 2, 3, 4, 5, 6, 7, 8 };
    double c[] = { 0.1, 0.05, 0.02, 0.01, 0.01, 0.01, 0.02, 0.05 };

    generate(v, x0, y0, a, c);
    shuffle(v);

    // udostepniam dane dla procesu = 0
    err->setValues(v, LEN, MULT);

    test1(err);
    test2(err);
    test3(err);
    test4(err);
    test1(err);

    testL(err);

    MPI_Finalize();

    return 0;
}

double
tvgetf(void)
{
    struct timespec ts;
    double sec;

    clock_gettime(CLOCK_REALTIME,&ts);

    sec = ts.tv_nsec;
    sec /= 1e9;
    sec += ts.tv_sec;

    return sec;
}

int
dbgif(int lvl)
{
    int goflg = 0;

    do {
        if (! opt_debug)
            break;

        if (glob_rank == 0) {
            goflg = 1;
            break;
        }

        if (lvl > opt_debug)
            goflg = 1;
    } while (0);

    return goflg;
}

void
_dbgprt(const char *fmt,...)
{
    va_list ap;
    double tvnow;

    tvnow = tvgetf();
    tvnow -= tvzero;

    printf("%.9f/%d ",tvnow,glob_rank);

    va_start(ap,fmt);
    vprintf(fmt,ap);
    va_end(ap);
}

UPDATE:

I've updated my full code example above to add tracing. It seems to work in all cases.

However, to answer your followup questions ...

what if I cannot modify the main class?

I presume you mean "main file" which includes the test* functions? No. You must modify the test* functions because they are the source of the bugs you're having.

If I had to only modify Error, would it work if I moved out the MPI_reduce out of those loops, and added error = error_p at the end of the if statement as you suggested?

Mostly, but you want error_p = error instead. Reread my comment about this in my original post and reread the documentation for MPI_Reduce. First arg is what client sends to root and second arg is how root gets the value.

Interesting enough I'm also not able to broadcast length_2.

That's because you were using MPI_INT instead of MPI_DOUBLE

Then I guess I'd have to add all those Bcasts, but even if I broadcast the size value I dont know how to set the second parameter of the Bcast function for the rest of the arrays. Everything I try ends with segmentation fault.

You do not need to broadcast the x0 et. al. arrays if you always set them in test*, regardless of rank. Remember that I mentioned that the calculation was small enough to be able to be repeated in all ranks

Not setting them in all ranks is the bug.

As expected, no matter what I try, I'm not able to broadcast any array to the rest of the processes.

The broadcast part is working fine (i.e. rank 0 is sending the data correctly). But, the non-root ranks have no place to store it and you're getting segfaults.

Here is your test4:

void
test4(Error * err, int rank)
{

    if (rank == 0) {
        double *x0 = new double[150];
        double *y0 = new double[150];
        double *a = new double[150];
        double *c = new double[150];

        for (int i = 0; i < 150; i++) {
            x0[i] = 5 - i * 0.2;
            a[i] = 1 + i / 8.0;
            y0[i] = 2 - i * 0.22;
            c[i] = 2 + i / 38.0;
        }

        err->setCoefficients(x0, y0, a, c, 150);
    }

    test(err, 3303.04, rank);
}

Here is the modified version:

void
test4(Error * err, int rank)
{
    double *x0 = new double[150];
    double *y0 = new double[150];
    double *a = new double[150];
    double *c = new double[150];

    for (int i = 0; i < 150; i++) {
        x0[i] = 5 - i * 0.2;
        a[i] = 1 + i / 8.0;
        y0[i] = 2 - i * 0.22;
        c[i] = 2 + i / 38.0;
    }

    err->setCoefficients(x0, y0, a, c, 150);

    test(err, 3303.04, rank);
}

In my version, setCoefficients is always being called. So, the non-root rank Error class will always have valid [non-null] pointers for x0, etc.

In your version, setCoefficients is only called for the root rank. So, the non-root ranks will always have null pointers for x0, etc. Thus, you'll get a segfault.

Note that if you had compiled with -Wall, some of the test* functions would have been flagged with warnings about possibly uninitialized values. Fixing those would probably have put you on to the problem earlier.

At a minimum, even if you wish to bcast x0 et. al., you must still ensure the child ranks have valid/sufficient space allocated.

To achieve this, here's a modified test4:

void
test4(Error * err, int rank)
{

    if (rank == 0) {
        double *x0 = new double[150];
        double *y0 = new double[150];
        double *a = new double[150];
        double *c = new double[150];

        for (int i = 0; i < 150; i++) {
            x0[i] = 5 - i * 0.2;
            a[i] = 1 + i / 8.0;
            y0[i] = 2 - i * 0.22;
            c[i] = 2 + i / 38.0;
        }

        err->setCoefficients(x0, y0, a, c, 150);
    }
    else
        err->setBuffers(150);

    test(err, 3303.04, rank);
}

And, you'll need something like:

void
Error::setBuffers(int size)
{

    this->x0 = realloc(this->x0,sizeof(double) * size);
    this->y0 = realloc(this->y0,sizeof(double) * size);
    this->a = realloc(this->a,sizeof(double) * size);
    this->c = realloc(this->c,sizeof(double) * size);

    this->size = size;
}

So, after doing all that, you end up with a more complicated solution and you still have to modify test*.

Of course, in the MPI version of calcError, you'd want to do:

MPI_Bcast(x0, size, MPI_DOUBLE, 0, MPI_COMM_WORLD);