According to C10k and this paper, throughput of 1-thread-per-connection servers degrade as more and more clients connect and more and more threads are created. According to those two sources, this is because the more threads exist, the more time is spent on context switching compared to actual work done by those threads. Evented servers don't seem to suffer as much from performance degredation at high connection counts.
However, evented servers also do context switches between clients, they just do it in userspace.
- Why are these userspace context switches faster than kernel thread context switches?
- What exactly does a kernel context switch do that's so much more expensive?
- How expensive is a kernel context switch exactly? How much time does it take?
- Does kernel context switching time depend on the number of threads?
I'm mostly interested in how the Linux kernel handles context switching but information about other OSes is welcome too.