Reduce Context Switches Between Threads With Same Priority

Question

I am writing an application that use a third-party library to perform heavy computations.

This library implements parallelism internally and spawn given number threads. I want to run several (dynamic count) instances of this library and therefore end up with quite heavily oversubscribing the cpu.

Is there any way I can increase the "time quantum" of all the threads in a process so that e.g. all the threads with normal priority rarely context switch (yield) unless they are explicitly yielded through e.g. semaphores?

That way I could possibly avoid most of the performance overhead of oversubscribing the cpu. Note that in this case I don't care if a thread is starved for a few seconds.

EDIT:

One complicated way of doing this is to perform thread scheduling manually.

Enumerate all the threads with a specific priority (e.g. normal).
Suspend all of them.
Create a loop which resumes/suspends the threads every e.g. 40 ms and makes sure no mor threads than the current cpu count is run.

Any major drawbacks with this approach? Not sure what the overhead of resume/suspending a thread is?

Are the lib instances interdependent? If not, why can you not avoid overloading the available cores by running as many threads as cores in the manner of a thread pool? — Martin James
Well since the number of instances change dynamically that is difficult. I would need to initialise/reinitialize the library multiple times to change the thread count depending on the current load. — ronag
If you used a threadpool, the number of library instances running at any one time would be the same as the number of threads in the pool. Other library instances would just queue up until pool threads became available to process them. There seems to be no point in applying CPU to an instance if the result is overloading? — Martin James
If this is windows, perhaps you could decrease the tick rate by using timeBeginPeriod() and timeEndPeriod() to set the ticker to a higher value. I think the default is 15.625 ms (64hz). — rcgldr
If you have 8 cores, start 8 threads. The 8 threads pop library instances from a blocking queue and process them as they become available. If more than 8 instances are queued up, 8 get processed and the remainder queue up. When a thread has completed running an instance, it loops back to try an pop another from the queue. Since there are always just the 8 threads, there can be no overloading, no matter how many instances are submitted to the queue. — Martin James

David Schwartz David Schwartz · Accepted Answer · 2014-08-31T11:08:44

There is nothing special you need to do. Any decent scheduler will not allow unforced context switches to consume a significant fraction of CPU resources. Any operating system that doesn't have a decent scheduler should not be used.

The performance overhead of oversubscribing the CPU is not the cost of unforced context switches. Why? Because the scheduler can simply avoid those. The scheduler only performs an unforced context switch when that has a benefit. The performance costs are:

It can take longer to finish a job because more work will be done on other jobs between when the job is started and when the job finishes.
Additional threads consume memory for their stacks and related other tracking information.
More threads generally means more contention (for example, when memory is allocated) which can mean more forced context switches where a thread has to be switched out because it can't make forward progress.

You only want to try to change the scheduler's behavior when you know something significant that the scheduler doesn't know. There is nothing like that going on here. So the default behavior is what you want.

Reduce Context Switches Between Threads With Same Priority

2 Answers