Using OpenMP (libgomp) in an already multithreaded application

Question

We are using OpenMP (libgomp) in order to speed up some calculations in a multithreaded Qt application. The parallel OpenMP sections are located within two different threads, though in fact they never execute in parallel. What we observe in this case is that 2N (where N = OMP_THREAD_LIMIT) omp threads are launched, apparently interfering each with the other. The calculation time is very high, while the processor load is low. Setting OMP_WAIT_POLICY hardly has any effect.

We also tried moving all the omp sections to a single thread (this is not a good solution for us, though, from an architectural point of view). In this case, the overall calculation time does drop and the processor is fully loaded, but only if OMP_WAIT_POLICY is set to ACTIVE. When OMP_WAIT_POLICY == PASSIVE, the calculation time remains low and the processor is idle 50% of time.

Odd enough, when we use omp within a single thread, the first loop parallelized using omp (in a series of omp calulations) executes 10 times slower compared to the multithread case.

Upd: Our questions are:

a) is there any way to reuse the openmp threads when using omp in the context of different threads.

b) Why executing with OMP_WAIT_POLICY == PASSIVE slows everything. Does it take so long to wake the threads?

c) Is there any logical explanation for the phenomenon of the first parallel block being so slow (even when waiting in active mode)

Upd2: Please mind that the issue is probably related to GNU OMP implementation. icc doesn't have it.

+1. I recently considered using OpenMP in a library used by a Qt app. @Sergey: Which platform are you on? — Fred Foo
@larsmans: We run on Linux and Windows. This phenomenon, however, has been studied only on Linux so long. — Sergey Levi
The question is vague to answer it. What your OpenMP threads are doing? Are there any data sharing? Any locks? Could you briefly write a simple code? I don't think oversubscription (running 2N) would be a big problem, though. — minjang
Sorry for the late answer. As it looks now, after some research, it doesn't actually matter whatever the threads do. At the same time, oversubscription does seem to be the big problem, as setting OMP threads number even to N+1 affects the performance drastically. The threads don't even need to be doing anything. I guess it might be due to some housekeeping jobs that libgomp performs. — Sergey Levi

osgx osgx · Accepted Answer · 2010-12-20T15:37:44

Try to start/stop openmp threads in runtime using omp_set_num_threads(1) and omp_set_num_threads(cpucount)

This call with (1) should stop all openmp worker threads, and call with (cpu_num) will restart them again.

So, at start of programm, run omp_set_num_threads(1). Before omp-parallelized region, you can start omp threads even with WAIT_POLICY=active, and they will not consume cpu before this point.

After omp parallel region you can stop threads again.

The omp_set_num_threads(cpucount) call is very slow, slower than waking threads with wait_policy=passive. This can be the reason for (c) - if your libgomp starts threads only at first parallel region.

Using OpenMP (libgomp) in an already multithreaded application

1 Answers