Why does calculation with OpenMP take 100x more time than with a single thread?

Question

I am trying to test Pi calculation problem with OpenMP. I have this code:

#pragma omp parallel private(i, x, y, myid) shared(n) reduction(+:numIn) num_threads(NUM_THREADS)
{
printf("Thread ID is: %d\n", omp_get_thread_num());
myid = omp_get_thread_num();
printf("Thread myid is: %d\n", myid);

  for(i = myid*(n/NUM_THREADS); i < (myid+1)*(n/NUM_THREADS); i++) {
//for(i = 0; i < n; i++) {

    x = (double)rand()/RAND_MAX;

    y = (double)rand()/RAND_MAX;

    if (x*x + y*y <= 1) numIn++;

  }
printf("Thread ID is: %d\n", omp_get_thread_num());

}

  return 4. * numIn / n;

}

When I compile with gcc -fopenmp pi.c -o hello_pi and run it time ./hello_pi for n = 1000000000 I get

real 8m51.595s

user 4m14.004s

sys 60m59.533s

When I run it on with a single thread I get

real 0m20.943s

user 0m20.881s

sys 0m0.000s

Am I missing something? It should be faster with 8 threads. I have 8-core CPU.

Might the rand() be the bottleneck due to bad multithreading performance? — urzeit
You shouldn't handle the repartition of the work yourself, use #pragma omp for instead. — Michael M.
Before looking at your code why don't you give the results with optimization on. Use -O3 or at least -O2. — Z boson

sloanyang sloanyang · Accepted Answer · 2013-10-21T09:14:39

Please take a look at the http://people.sc.fsu.edu/~jburkardt/c_src/openmp/compute_pi.c This might be a good implementation for pi computing.

It is quite important to know that how your data spread to different threads and how the openmp collect them back. Usually, a bad design (which has data dependencies across threads) running on multiple thread will result in a slower execution than a single thread .

Why does calculation with OpenMP take 100x more time than with a single thread?

3 Answers