7
votes

I am writing an application that use a third-party library to perform heavy computations.

This library implements parallelism internally and spawn given number threads. I want to run several (dynamic count) instances of this library and therefore end up with quite heavily oversubscribing the cpu.

Is there any way I can increase the "time quantum" of all the threads in a process so that e.g. all the threads with normal priority rarely context switch (yield) unless they are explicitly yielded through e.g. semaphores?

That way I could possibly avoid most of the performance overhead of oversubscribing the cpu. Note that in this case I don't care if a thread is starved for a few seconds.

EDIT:

One complicated way of doing this is to perform thread scheduling manually.

  1. Enumerate all the threads with a specific priority (e.g. normal).
  2. Suspend all of them.
  3. Create a loop which resumes/suspends the threads every e.g. 40 ms and makes sure no mor threads than the current cpu count is run.

Any major drawbacks with this approach? Not sure what the overhead of resume/suspending a thread is?

2
Are the lib instances interdependent? If not, why can you not avoid overloading the available cores by running as many threads as cores in the manner of a thread pool?Martin James
Well since the number of instances change dynamically that is difficult. I would need to initialise/reinitialize the library multiple times to change the thread count depending on the current load.ronag
If you used a threadpool, the number of library instances running at any one time would be the same as the number of threads in the pool. Other library instances would just queue up until pool threads became available to process them. There seems to be no point in applying CPU to an instance if the result is overloading?Martin James
If this is windows, perhaps you could decrease the tick rate by using timeBeginPeriod() and timeEndPeriod() to set the ticker to a higher value. I think the default is 15.625 ms (64hz).rcgldr
If you have 8 cores, start 8 threads. The 8 threads pop library instances from a blocking queue and process them as they become available. If more than 8 instances are queued up, 8 get processed and the remainder queue up. When a thread has completed running an instance, it loops back to try an pop another from the queue. Since there are always just the 8 threads, there can be no overloading, no matter how many instances are submitted to the queue.Martin James

2 Answers

4
votes

There is nothing special you need to do. Any decent scheduler will not allow unforced context switches to consume a significant fraction of CPU resources. Any operating system that doesn't have a decent scheduler should not be used.

The performance overhead of oversubscribing the CPU is not the cost of unforced context switches. Why? Because the scheduler can simply avoid those. The scheduler only performs an unforced context switch when that has a benefit. The performance costs are:

  1. It can take longer to finish a job because more work will be done on other jobs between when the job is started and when the job finishes.

  2. Additional threads consume memory for their stacks and related other tracking information.

  3. More threads generally means more contention (for example, when memory is allocated) which can mean more forced context switches where a thread has to be switched out because it can't make forward progress.

You only want to try to change the scheduler's behavior when you know something significant that the scheduler doesn't know. There is nothing like that going on here. So the default behavior is what you want.

2
votes

Any major drawbacks with this approach? Not sure what the overhead of resume/suspending a thread is?

Yes,resume/suspend the thread is very very dangerous activity done in user mode of program. So it should not be used(almost never). Moreover we should not use these concepts to achieve something which any modern scheduler does for us. This too is mentioned in other post of this question.

The above is applicable for any operating system, but from SO post tag it appears to me that it has been asked for Microsoft Windows based system. Now if we read about the SuspendThread() from MSDN, we get the following:

"This function is primarily designed for use by debuggers. It is not intended to be used for thread synchronization. Calling SuspendThread on a thread that owns a synchronization object, such as a mutex or critical section, can lead to a deadlock if the calling thread tries to obtain a synchronization object owned by a suspended thread".

So consider the scenario in which thread has acquired some resource(implicitly .i.e. part of not code..by library or kernel mode), and if we suspend the thread this would result into mysterious deadlock situation as other threads of that process would be waiting for that particular resource. The fact is we are not sure(at any time) in our program that what sort of resources are acquired by any running thread, suspend/resume thread is not good idea.