5
votes

I have been working on a quantum simulation. Each time step a potential function is calculated, one step of the solver is iterated, and then a series of measurements are conducted. These three processes are easily parallelizable, and I've already made sure they don't interfere with each other. Additionally there is some stuff that is fairly simple, but should not be done in parallel. An outline of the setup is shown below.

omp_set_num_threads(3);
#pragma omp parallel
{
    while (notDone) {
        #pragma omp sections
        {
            #pragma omp section
            {
                createPotential();
            }
            #pragma omp section
            {
                iterateWaveFunction();
            }
            #pragma omp section
            {
                takeMeasurements();
            }
        }
        #pragma omp single
        {
            doSimpleThings();
        }
    }
}

The code works just fine! I see a speed increase, mostly associated with the measurements running alongside the TDSE solver (about 30% speed increase). However, the program goes from using about 10% CPU (about one thread) to 35% (about three threads). This would make sense if the potential function, TDSE iterator, and measurements took equally as long, but they do not. Based on the speed increase, I would expect something on the order of 15% CPU usage.

I have a feeling this has to do with the overhead of running these three threads within the while loop. Replacing

#pragma omp sections

with

#pragma omp parallel sections

(and omitting the two lines just before the loop) changes nothing. Is there a more efficient way to run this setup? I'm not sure if the threads are constantly being recreated, or if the thread is holding up an entire core while it waits for the others to be done. If I increase the number of threads from 3 to any other number, the program uses as much resources as it wants (which could be all of the CPU) and gets no performance gain.

1
OpenMP does sometimes let the threads spin for no apparent reason. However, I don't exactly know what causes it. Given that you're just trying to run 3 functions in parallel, you might try using STL, I think something like std::async could work. You can even make your main thread wait for the 3 to finish by giving them a return value and binding it to a std::future object.Qubit
I think there is a barrier at the single statement so the other threads should sleep. Also I would look into OMP_WAIT_POLICY stackoverflow.com/a/12617270/2542702Z boson
Maybe this is an issue with sections and a could be a reason to use tasks. Have you tried using tasks instead of sections? Sections to me seem to be the old way of doing things before tasks were added in OpenMP 3.0.Z boson
You could try #pragma omp sections nowait and then add a #pragma omp barrier if you need one.Z boson
Tasks are probably a better fit stackoverflow.com/a/13789119/2542702Z boson

1 Answers

1
votes

I've tried many options, including using tasks instead of sections (with the same results), switching compilers, etc. As suggested by Qubit, I also tried to use std::async. This was the solution! The CPU usage dropped from about 50% to 30% (this is on a different computer from the original post, so the numbers are different -- it's a 1.5x performance gain for 1.6x CPU usage basically). This is much closer to what I expected for this computer.

For reference, here is the new code outline:

void SimulationManager::runParallel(){
    auto rV = &SimulationManager::createPotential();
    auto rS = &SimulationManager::iterateWaveFunction();
    auto rM = &SimulationManager::takeMeasurements();
    std::future<int> f1, f2, f3;
    while(notDone){
        f1 = std::async(rV, this);
        f2 = std::async(rS, this);
        f3 = std::async(rM, this);
        f1.get(); f2.get(); f3.get();
        doSimpleThings();
    }
}

The three original functions are called using std::async, and then I use the future variables f1, f2, and f3 to collect everything back to a single thread and avoid access issues.