Linux Device Drivers: Sleeping whiile Holding Lock Revisited

Question

I've always been taught that sleeping while holding a spinlock in kernel code is a no-no. The reason for this is the following scenario:

Thread A acquires the lock, does some work, and calls a kernel function that goes to sleep, releasing the CPU.
Thread B now runs, which attempts to acquire the lock, but can't since its held by Thread A.
This is a deadlock. Thread A can't get woken up since Thread B is constantly spinning and Thread B can't make any progress since it can't get the lock.

I am maintaining a driver which uses lots of locks and inside some of the locked sections are obvious things like memory allocations, copy_to_user(), etc.

However, I am not entirely convinced that I have a bug on my hands. Using the above scenario, Thread A is user context (namely inside the implementation of read()) while Thread B is the interrupt context (inside the ISR). The lock is locked via spin_lock_irqsave(). As a result, Thread B cannot run while Thread A holds the lock, making the deadlock not possible.

I have also considered the following:

Thread A = interrupt context. Here Thread A cannot sleep (and doesn't) so we never get into the deadlock condition.
Thread A and B are both user context (i.e., concurrent calls to read() ). This cannot happen due to other mechanisms in place.

Is there anything that I am missing? In what I described above, is there any real danger related to sleeping while holding the lock?

...a driver which uses lots of locks... — design smells. — 0andriy

Tsyvarev Tsyvarev · Accepted Answer · 2018-05-20T23:14:48

I've always been taught that sleeping while holding a spinlock in kernel code is a no-no.

You confuse spinlocks and disabled interrupts (atomic context, IRQ).

Just sleeping while having a spinlock (acquired with spin_lock) harms perfomance, because the other thread, wanting to lock the spinlock, wastes a time with busy wait. But otherwise the system is OK.
But sleeping with disabled interrupts means the CPU core is dead: it executes nothing and doesn't react on interrupts from the other cores and outer world.

This is what would happen, if your call spin_lock_irqsave followed by copy_from_user: the first operations disables interrupts, and the second may sleep.

Usually disabled interrupts accompanies spinlocks (by means of spin_lock_irqsave or similar). Otherwise, if IRQ thread will attempt to acquire the same spinlock, deadlock will be observed: owner thread cannot proceed because it is preempted, and IRQ thread cannot proceed because it waits on spinlock.

Disabling interrupts is not needed for taking some spinlock, if no IRQ thread(or a thread with interrupts disabled for other reason) locks given spinlock. When disabling interrupts is not needed, spin_lock can be used instead of spin_lock_irqsave. But it is very recommended to replace non-IRQ spinlocks with mutexes.

Linux Device Drivers: Sleeping whiile Holding Lock Revisited

2 Answers