8
votes

I recently discovered the concept of thread pools. As far as I understand GCC, ICC, and MSVC all use thread pools with OpenMP. I'm curious to know what happens when I change the number of threads? For example let's assume the default number of threads is eight. I create a team of eight threads and then in a later section I do four threads and then I go back to eight.

#pragma omp parallel for
for(int i=0; i<n; i++)

#pragma omp parallel for num_threads(4)
for(int i=0; i<n; i++)

#pragma omp parallel for 
for(int i=0; i<n; i++)

This is something I actually do now because part of my code gets worse results with hyper-threading so I lower the number of thread to the number of physical cores (for that part of the code only). What if I did the opposite (4 thread, then eight, then 4)?

Does the thread pool have to be recreated each time I change the number of threads? If not, does adding or removing threads cause any significant overhead?

What's the overhead for the thread pool, i.e. what fraction of the work per thread goes to the pool?

1
libgomp spawns additional threads when more are needed but does not kill already spawned ones and rather puts them to sleep in a docking barrier. The actual overhead could be measured using EPCC's OpenMP microbenchmarks.Hristo Iliev

1 Answers

13
votes

It might be late by now to answer this question. However, I am going to do so.

When you start with 8 threads from the beginning, a total of 7 threads will be created, then, including your main process, you have a team of 8. So, first loop in your sample code would be executed using this team. Therefore, the thread pool has 8 threads. After they are done with this region, they go into sleep until woken up.

Now, when you reach second parallel region with 4 threads, only 3 threads from your thread pool is woken up (3 threads + your current main thread) and the rest of the threads are still in sleep mode. So, four of the threads are sleeping.

And then, similar to first parallel region, all threads will incorporate with each other to do the third parallel region.


On the other hand, if you start with 4 threads and the second parallel region asks for 8 threads, then the OpenMP library will react to this change and create 4 extra threads to meet what you asked for (8 threads). Usually created threads are not thrown out of the pool until the end of program life. Hopefully, you might need it in future. It is a general approach that most OpenMP libraries follow. The reason behind this idea is the fact that creating new threads is an expensive job and that's why they try to avoid it and postpone it as much as they can.


Hope this helps you and future commuters in here.