linux 2.6 scheduling and preemption - preempt_count use

Question

A little discussion before the question. Linux 2.4 kernel is non preemtive, so if there is a need for context-switch when we are proccecing a system call in the kernel mode, we only do set_need_resched to raise a flag and then when we go back to user mode, we check the flag and do context-switch.

Lets compare this to linux 2.6 which has preemptive kernel. We can't just take kernel of 2.4 and change the set_need_resched (raising flag) to schedule() (directive execution of rescheduling), so in linux kernel 2.6 there is a counter preempt_count, which increases every time on spin_lock() and decreases on spin_unlock().

Actually, this field "preempt_count" determine if the kernel can be preempted. For example on a returning from clock interrupt, if the condition:

(current->need_resched == 1) && (current->preempt_count == 0)

is true, then the kernel executes context-switch.

The Question is why the kernel of linux 2.6 prevents preemption when a lock of type spinlock is held.

What is the scenario that could happen if the kernel didn't prevent the preemption ? Can you give me a concrete example as much detailed as you can ?

Thank you.

Possible duplicate of Why linux disables kernel preemption after the kernel code holds a spinlock? — Tsyvarev

employee of the month employee of the month · Accepted Answer · 2017-12-30T23:01:34

Have you read about sleepable locks like mutexes or semaphores?

In their case, if the lock cannot be taken, the thread can put itself to sleep and e.g. lend its priority so that the lock owner (if sleeping) can get the work done faster. In particular it is possible that the thread which wants to take the lock runs on the cpu the lock owner is scheduled to continue on.

On the other hand with spinlocks the assumption is nobody sleeps - this means that in particular busy waiting (i.e. staying on cpu) does not block the lock owner. If the lock is held, the owner is running somewhere. But let's say it went to sleep. This would mean the waiting thread would waste the time spinning as the owner can't get back to work. Only after the scheduler decided its enough it would get preempted, but then there is no relationship established between the waiter and the owner. So in particular it may be the waiter will get back on the cpu to continue busy waiting, while the lock owner still did not get the chance to run.

So at the very least it would be a huge performance problem. In practice it would just lead to livelocks under high load where the kernel is unable to make forward progress.

linux 2.6 scheduling and preemption - preempt_count use

1 Answers