I'm trying to use OpenMP to get some performance for realtime audio processing. I took an algorithm looking like this:
preparation
for (int I=0; I<1024; I++)
something quite demanding
finalization
When not parallelized, it took about 3% of CPU according to the system meter. Now, if I parallelized the main loop, OMP used 8 threads (4 core i7 with hyperthreading), the main thread consumption went down to 2%, so the response was 33% faster, but the system performance meter started showing 100% (!!) overall system response, all cores fully loaded.
That looks like the threads were doing a lot of "nothing taking CPU" even during the waiting for next audio data request. Any ideas what that could be? The fact that the response was 33% faster is nice, but assuming there may be many similar processors running at the same moment, 100% CPU usage is just not usable. Perhaps OMP threads were actively waiting for more tasks?
I'm using MSVC 2013.