9
votes

I'm running a completely parallel matrix multiplication program on a Mac Pro with a Xeon processor. I create 8 threads (as many threads as cores), and there are no shared writing issues (no writing to the same locations). For some reason, my use of pthread_create and pthread_join is about twice as slow as using #pragma openmp.

There are no other differences in anything... same compile options, same number of threads in both cases, same code (except the pragma/pthread portions obviously), etc.

And the loops are very big -- I'm not parallelizing small loops.

(I can't really post the code because it's school work.)

Why might this be happening? Doesn't OpenMP use POSIX threads itself? How can it be faster?

1
Do they both use the same amount of cumulative CPU time?Gabe
Have you verified that OpenMP is using the same number of threads as your manual version?Gabe
What happens if you only use 7 threads on each?Jess
@Jess: Brilliant question!! I tried it, it was faster... it turned out I was creating 8 threads, but I already had a master thread, for a total of 9, which is one more than the number of cores! (Wow, haha...) Feel free to put that as the answer so I'll accept it. :)user541686

1 Answers

6
votes

(edited) What is your main thread doing? Without seeing your code, I was guessing that the main thread is actually barely running, but still eating up clock-cycles while the pthreads finish, then it starts again and continues. Each time its given cycles there is overhead to pausing/continuing the other threads.

In OpenMP, the main thread probably goes to sleep, and waits for a wake-up event when the parallel regions finish.