I am not sure why OpenMP uses so many threads. It seems to be not related to the Microsoft implementation because I have also tried the Intel library which shows the same behavior. I have some parallel sections in my code which are compute bound and should not create and use more threads than I have cores. But what I have observed is that for n initiating threads OpenMP creates n*Cores threads. This looks like a big thread leak to me.
If I execute a "small" 32 bit application running on a server it can fail because 1000 OpenMP threads need 2 GB of address space already leaving no memory for the application. That should not happen. I would expect from an state of the art thread pool to reuse its threads and to take away no longer used threads.
I have tried to use omp_set_num_threads(8) to limit the thread pool size to 8 cores but that seems to only limit the number of threads per initiating thread instance. Am I doing all wrong or is OpenMP not to be meant to be used that way?
On my 8 core machine 5 started threads in my AsyncWorker class will allocate 38 threads created by OpenMP. I would expect only 8 threads to be created and these should be reused across all 5 initiating threads.
#include <atomic>
#include <thread>
#include <omp.h>
#include <chrono>
#include <vector>
#include <memory>
class AsyncWorker {
private:
std::vector<std::thread> threads;
public:
AsyncWorker()
{
}
void start() // add one thread that starts an OpenMP parallel section
{
threads.push_back(std::thread(&AsyncWorker::threadFunc, this));
}
~AsyncWorker()
{
for (auto &t : threads)
{
t.join();
}
}
private:
void threadFunc()
{
std::atomic<int> counter;
auto start = std::chrono::high_resolution_clock::now();
std::chrono::milliseconds durationInMs;
while (durationInMs.count() <5000l)
{
// each instance seems to get its own thread pool.
// Why? And how can I limit the threadpool to the number of cores and when will the threads be closed?
#pragma omp parallel
{
counter++;
auto stop = std::chrono::high_resolution_clock::now();
durationInMs = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);
}
}
}
};
int main() {
//omp_set_dynamic(0);
//omp_set_nested(0);
//omp_set_num_threads(8);
{
AsyncWorker foo;
foo.start(); // 1
foo.start(); // 2
foo.start(); // 3
foo.start(); // 4
foo.start(); // 5
system("pause");
}
return 0;
}