2
votes

I need some help to parallelize the pi calculation with the monte carlo method with openmp by a given random number generator, which is not thread safe.

First: This SO thread didn't help me.

My own try is the following #pragma omp statements. I thought the i, x and y vars should be init by each thread and should than be private. z ist the sum of all hits in the circle, so it should be summed after the implied barriere after the for loop.

Think the main problem ist the static state var of the random number generator. I made a critical section where the functions are called, so that only one thread per time could execute it. But the Pi solutions doesn't scale with more higher values.

Note: I should not use another RNG, but its okay to make little changes on it.

int main (int argc, char *argv[]) {

    int i, z = 0, threads = 8, iters = 100000;
    double x,y, pi;

    #pragma omp parallel firstprivate(i,x,y) reduction(+:z) num_threads(threads)
        for (i=0; i<iters; ++i) {
            #pragma omp critical
            {
                x = rng_doub(1.0);
                y = rng_doub(1.0);
            }
            if ((x*x+y*y) <= 1.0)
                z++;
        }

    pi = ((double) z / (double) (iters*threads))*4.0;
    printf("Pi: %lf\n", pi);;
    return 0;
}

This RNG is actually an included file, but as I'm not sure if I create the header file correct, I integrated it in the other program file, so I have only one .c file.

#define RNG_MOD 741025

int rng_int(void) {
    static int state = 0;

    return (state = (1366 * state + 150889) % RNG_MOD);
}

double rng_doub(double range) {
    return ((double) rng_int()) / (double) ((RNG_MOD - 1)/range);
}

I've also tried to make the static int state global, but it doesn't change my result, maybe I done it wrong. So please could you help me make the correct changes? Thank you very much!

2
Could you please clarify the meaning of the words: "But the Pi solutions doesn't scale with more higher values." Apart from the fact that the critical section serialises your threads and you won't get any speed-up, the code looks correct to me.Hristo Iliev
Yes, sure. I mean that the calculated pi should be closer to the real pi value, if I run with more iterations. But with this random number generator i can't see this behaviour in general. And the docent says its because of the thread-unsafty of the state var. I should set it global and use one or more correct #pragma omp statements to handle it. But i've tried a lot of combinations and there changes nothing. Don't know, if I take the state var global, than as static or not? And is critical right at this spot? Need state to be shared()? Or better threadprivate(state)? Have already tried a lot.Danny Archer
I updated my answer using your random function.Z boson
The OpenMP construct that your docent probably refers to is threadprivate. See my semi answer below for an explanation why it won't improve the solution a lot.Hristo Iliev
@DannyArcher, I updated my answer using some of Hristo's suggestions.Z boson

2 Answers

0
votes

Try the code below. It makes a private state for each thread. I did something similar with the at rand_r function Why does calculation with OpenMP take 100x more time than with a single thread?

Edit: I updated my code using some of Hristo's suggestions. I used threadprivate (for the first time). I also used a better rand function which gives a better estimate of pi but it's still not good enough.

One strange things was I had to define the function rng_int after threadprivate otherwise I got an error "error: 'state' declared 'threadprivate' after first use". I should probably ask a question about this.

//gcc -O3 -Wall -pedantic -fopenmp main.c
#include <omp.h>
#include <stdio.h>

#define RNG_MOD 0x80000000
int state;

int rng_int(void);
double rng_doub(double range);

int main() {
    int i, numIn, n;
    double x, y, pi;

    n = 1<<30;
    numIn = 0;

    #pragma omp threadprivate(state)
    #pragma omp parallel private(x, y) reduction(+:numIn) 
    {

        state = 25234 + 17 * omp_get_thread_num();
        #pragma omp for
        for (i = 0; i <= n; i++) {
            x = (double)rng_doub(1.0);
            y = (double)rng_doub(1.0);
            if (x*x + y*y <= 1) numIn++;
        }
    }
    pi = 4.*numIn / n;
    printf("asdf pi %f\n", pi);
    return 0;
}

int rng_int(void) {
   // & 0x7fffffff is equivalent to modulo with RNG_MOD = 2^31
   return (state = (state * 1103515245 + 12345) & 0x7fffffff);
}

double rng_doub(double range) {
    return ((double)rng_int()) / (((double)RNG_MOD)/range);
}

You can see the results (and edit and run the code) at http://coliru.stacked-crooked.com/a/23c1753a1b7d1b0d

8
votes

Your original linear congruent PRNG has a cycle length of 49400, therefore you are only getting 29700 unique test points. This is a terrible generator to be used for any kind of Monte Carlo simulations. Even if you make 100000000 trials, you won't get any closer to the true value of Pi because you are simply repeating the same points over and over again and as a result both the final value of z and iters are simply multiplied by the same constant, which cancel in the end during the division.

The per-thread seed introduced by Z boson improves the situation a little bit with the number of unique points increasing with the total number of OpenMP threads. The increase is not linear since if the seed of one PRNG falls in the sequence of another PRNG, both PRNGs produce the same sequence shifted with no more than 49400 elements. Given the cycle length, each PRNG covers 49400/RNG_MOD = 6,7% of the total output range and that is the probability of two PRNGs being synchronised. There are a total of RNG_MOD/49400 = 15 unique sequences possible. It basically means that in the best seeding case scenario you won't be able to get past 30 threads as any other thread would simply repeat the result of some of the others. The multiplier 2 comes from the fact that each point uses two elements from the sequence and therefore it is possible to get a different set of points if you shift the sequence by one element.

The ultimate solution is to completely drop your PRNG and stick to something like Mersenne twister MT19937, which has a cycle length of 219937 − 1 and a very strong seeding algorithm. If you are not able to use another PRNG as you state in your question, at least modify the constants of the LCG to match those used in rand():

int rng_int(void) {
   static int state = 1;

   // & 0x7fffffff is equivalent to modulo with RNG_MOD = 2^31
   return (state = (state * 1103515245 + 12345) & 0x7fffffff);
}

Note that rand() is not a good PRNG - it is still bad. It is just a little better than the one used in your code.