Parallelization efficiency of openMP

Question

I have a C++ code containig many for-loops parallelized with openMP on a 8-thread computer.

But the speed of execution with single thread is faster than parallel 8 thread. I was told that if the load of the for-loops increases parallelization will become efficient.

Here with load I mean for example maximum number of iterations for a loop. The thing is I dont have a chance to compare single and 8-thread parallel code for a huge amount of data.

Should I use parallel code anyway? Is it true that parallelization efficiency will increase with load of for-loops?

Your question is too broad for and not well suited for SO. Consider narrowing it and providing some code samples. — Hristo Iliev

High Performance Mark High Performance Mark · Accepted Answer · 2012-06-26T12:44:45

The canonical use case for OpenMP is the distribution among a team of threads of the iterations of a high iteration count loop with the condition that the loop iterations have no direct or indirect dependencies.

You can spot what I mean by direct dependencies by considering the question Does the order of loop iteration execution affect the results ?. If, for example, iteration N+1 uses the results of iteration N you have such a dependency, running the loop iterations in reverse order will change the output of the routine.

By indirect dependencies I mean mainly data races, in which threads have to coordinate their access to shared data, in particular they have to ensure that writes to shared variables happen in the correct sequence.

In many cases you can redesign a loop-with-dependencies to remove those dependencies.

IF you have a high iteration count loop which has no such dependencies THEN you have a candidate for good speed-up with OpenMP. Here are the buts:

There is some parallel overhead to the computation at the start and end of each such loop, if the loop count isn't high enough this overhead may outweigh, partially or wholly, the speedup of running the iterations in parallel. The only way to determine if this is affecting your code is to test and measure.
There can be dependencies between loop iterations more subtle than I have already outlined. Depending on your system architecture and the computations inside the loop you might (without realising it) program your threads to fight over access to cache or to I/O resources, or to any other resource. In the worst cases this can lead to increasing the number of threads leading to decreasing execution rate.
You have to make sure that each OpenMP thread is backed up by hardware, not by the pseudo-hardware that hyperthreading represents. One core per OpenMP thread, hyperthreading is snake oil in this domain.
I expect there are other buts to put in here, perhaps someone else will help out.

Now, turning to your questions:

Should I use parallel code anyway? Test and measure.
Is it true that parallelization efficiency will increase with load of for-loops? Approximately, but for your code on your hardware, test and measure.

Finally, you can't become a serious parallel computationalist without measuring run times under various combinations of circumstances and learning what the measurements you make are telling you. If you can't compare sequential and parallel execution for huge amounts of data, you'll have to measure them for modest amounts of data and understand the lessons you learn before making predictions about behaviour when dealing with huge amounts of data.

Parallelization efficiency of openMP

1 Answers