I'm trying to understand how operating systems handle context switching in different models to better understand why NIO performance is better in cases of large peaks in the number of requests. Apart from the fact that there may be a limit to the number of threads, I'm curious how blocking operations being done in those large number of requests can affect resource utilization.
In a one request per thread model, say a servlet 2.5 based web application, if 499 threads are waiting for database IO and only one thread needs work, does the OS context switch between all of those 500 threads trying to find the one that needs work? To perform a context-switch, the operating system has to store the current thread's state, and restore the next thread's state. After doing so, the OS will find that it doesn't need any CPU time and will keep context switching until it finds the thread that needs work. Also, what does this look like in terms of server utilization? Is the CPU low as it's mostly just bound by the IO cost of swapping contexts in and out instead of actually computing anything?
Thanks in advance for any help. If you can point me in the direction of books, textbooks etc I would really appreciate that as well.