I recently discovered the concept of thread pools. As far as I understand GCC, ICC, and MSVC all use thread pools with OpenMP. I'm curious to know what happens when I change the number of threads? For example let's assume the default number of threads is eight. I create a team of eight threads and then in a later section I do four threads and then I go back to eight.
#pragma omp parallel for
for(int i=0; i<n; i++)
#pragma omp parallel for num_threads(4)
for(int i=0; i<n; i++)
#pragma omp parallel for
for(int i=0; i<n; i++)
This is something I actually do now because part of my code gets worse results with hyper-threading so I lower the number of thread to the number of physical cores (for that part of the code only). What if I did the opposite (4 thread, then eight, then 4)?
Does the thread pool have to be recreated each time I change the number of threads? If not, does adding or removing threads cause any significant overhead?
What's the overhead for the thread pool, i.e. what fraction of the work per thread goes to the pool?
libgomp
spawns additional threads when more are needed but does not kill already spawned ones and rather puts them to sleep in a docking barrier. The actual overhead could be measured using EPCC's OpenMP microbenchmarks. – Hristo Iliev