OpenMP creates too many threads

Question

I am not sure why OpenMP uses so many threads. It seems to be not related to the Microsoft implementation because I have also tried the Intel library which shows the same behavior. I have some parallel sections in my code which are compute bound and should not create and use more threads than I have cores. But what I have observed is that for n initiating threads OpenMP creates n*Cores threads. This looks like a big thread leak to me.

If I execute a "small" 32 bit application running on a server it can fail because 1000 OpenMP threads need 2 GB of address space already leaving no memory for the application. That should not happen. I would expect from an state of the art thread pool to reuse its threads and to take away no longer used threads.

I have tried to use omp_set_num_threads(8) to limit the thread pool size to 8 cores but that seems to only limit the number of threads per initiating thread instance. Am I doing all wrong or is OpenMP not to be meant to be used that way?

On my 8 core machine 5 started threads in my AsyncWorker class will allocate 38 threads created by OpenMP. I would expect only 8 threads to be created and these should be reused across all 5 initiating threads.

#include <atomic>
#include <thread>
#include <omp.h>
#include <chrono>
#include <vector>
#include <memory>

class AsyncWorker {
private:
    std::vector<std::thread> threads;

public:
    AsyncWorker()
    {
    }

    void start() // add one thread that starts an OpenMP parallel section
    {
        threads.push_back(std::thread(&AsyncWorker::threadFunc, this));
    }

    ~AsyncWorker()
    {
        for (auto &t : threads)
        {
            t.join();
        }
    }

private:
    void threadFunc()
    {
        std::atomic<int> counter;

        auto start = std::chrono::high_resolution_clock::now();
        std::chrono::milliseconds durationInMs;

        while (durationInMs.count() <5000l) 
        {
        // each instance seems to get its own thread pool. 
        // Why? And how can I limit the threadpool to the number of cores and when will the threads be closed?
#pragma omp parallel  
            {
                counter++;
                auto stop = std::chrono::high_resolution_clock::now();
                durationInMs = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);
            }
        }
    }

};

int main() {
    //omp_set_dynamic(0);
    //omp_set_nested(0);
    //omp_set_num_threads(8);

    {
        AsyncWorker foo;

        foo.start();  // 1
        foo.start();  // 2 
        foo.start();  // 3
        foo.start();  // 4
        foo.start();  // 5

        system("pause");
    }

    return 0;
}

kreisrund kreisrund · Accepted Answer · 2016-05-23T13:12:31

The number of threads OpenMP uses is set per parallel section, and you are concurrently spawning 5 parallel sections. Therefore you get 40 Threads.

It seems to be that you are looking for task based parallelism. In OpenMP you could achieve that by starting a parallel region and then creating tasks as needed. From the top of my head code for this pattern is written like this:

// Start parallel region
#pragma omp parallel
{
  // Only let a single thread create the tasks
  #pragma omp single
  {
     for(int i = 0; i < 40; i++)
     {
       // Actually create the task that needs to be performed
       #pragma omp task
       {
         heavy_work();
       }
     }
  }
}

This way you would only have 8 Threads working in parallel.

OpenMP creates too many threads

2 Answers