Confusion around spin_lock_irqsave: in what nested situation is interrupt state kept?

Question

There are many Q&As about spinlocks, but it's still confusing to me. I think it's because the questions and answers assume different settings or not clearly explain the settings about if it's SMP or if it's preemptive kernel or not when they ask or answer (and some old info is mixed in too).

My first question is: (Q1) in SMP situation, is schedule() run on every processor concurrently (I know the scheduling starts by jiffies timer interrupt)? I'll assume yes in my question below. I would appreciate it if someone could briefly explain it to me how processes move among processor cores during scheduling too.

I'm trying to understand how, why, when spin_lock/unlock_irqsave is used. Here is my question.

Suppose there is a code which calls spin_lock_irqsave, and the interrupt state (enable) was a 'disable' at the time of calling spin_lock_irqsave(). Could this code be running in interrupt context? Probably not, because the ISR should not have kicked in in the first place if interrupt was disabled in the corresponding local processor. Therefore, the code calling spin_lock_irqsave must be in process context. Ok, the interrupt had been disabled previously, but a process is trying to lock with spin_lock_irqsave.

In what case could the interrupt have been disabled? I think there are two cases.

Case 1: a previous interrupt routine had been preempted by this process (which is calling this spin_lock_irqsave). This is weird because ISR cannot be preempted. (Q2) By the way, in preemptive kernel, can ISR be preempted by a process? (Q3) I guess because the preempt_count() is #defined as (current_thread_info()->preempt_count), the preempt_disable only works for process and not interrupt. Do interrupts also have the current thread info?

Case 2: a previous normal process had acquired the lock with spin_lock_irq (or irqsave). But this also is weird because before locking, spin_lock_irq (or irqsave) disables preemption and interrupt for the task telling the scheduler not to switch to other task after the scheduler timer interrupt. So this case cannot be true.

I know I have to look further about process scheduling for SMP and kernel preemption, and maybe I am misunderstanding something. Could somebody clear things up in my question? Thanks a lot for reading.

Marco Bonelli Marco Bonelli · Accepted Answer · 2020-03-26T05:30:48

There are many Q&As about spinlocks, but it's still confusing to me. I think it's because the questions and answers assume different settings or not clearly explain the settings about if it's SMP or if it's preemptive kernel or not when they ask or answer (and some old info is mixed in too).

I can only agree. Spinlocks, while simple in nature, are not a simple topic at all when included in the context of modern Linux kernels. I don't think you can get a good understanding of spinlocks just by reading random and case-specific Stack Overflow answers.

I would strongly suggest you to read Chapter 5: Concurrency and Race Conditions of the book Linux Device Drivers, which is freely available online. In particular, the "Spinlocks" section of Chapter 5 is very helpful to understand how spinlocks are useful in different situations.

(Q1) in SMP situation, is schedule() run on every processor concurrently? [...] I would appreciate it if someone could briefly explain it to me how processes move processor cores during scheduling too.

Yes, you can look at it that way if you like. Each CPU (i.e. every single processor core) has its own timer, and when a timer interrupt is raised on a given CPU, that CPU executes the timer interrupt handler registered by the kernel, which calls the scheduler, which re-schedules processes.

Each CPU in the system has its own runqueue, which holds tasks that are in a runnable state. Any task can be included in at most one runqueue and cannot run on multiple different CPUs at the same time.

The CPU affinity of a task is what determines which CPUs a task can be run on. The default "normal" affinity allows a task to run on any CPU (except in special configurations). Based on their affinity, tasks can be moved from one runqueue to another either by the scheduler or if they require so through the sched_setaffinity syscall (here's a related answer which explains how).

Suggested read: A complete guide to Linux process scheduling.

Suppose there is a code which calls spin_lock_irqsave, and the interrupt state (enable) was a 'disable' at the time of calling spin_lock_irqsave(). Could this code be running in interrupt context? Probably not.

Why not? This is possible. The code could be running in interrupt context, but not called by a different interrupt. See the bottom of my answer.

Case 1: a previous interrupt routine had been preempted by this process (which is calling this spin_lock_irqsave). This is weird because ISR cannot be preempted.

You're right, it's weird. More than weird though, this is impossible. On Linux, at all times, interrupts can either be enabled or disabled (there is no in-between). There isn't really a "priority" for interrupts (like there is for tasks), but we can classify them in two ranks:

Non-preemptible interrupts which necessarily need to run from start to finish with full control of the CPU. These interrupts put the system in the "disabled interrupts" state and no other interrupts can happen.
Preemptible interrupts which are re-entrant and allow other interrupts to happen. In case another interrupt happens when this interrupt is being serviced, you enter in a nested interrupt scenario, which is similar to the scenario of nested signal handlers for tasks.

In your case, since interrupts had previously been disabled, this means that if the code that disabled them was an interrupt, it was a non-preemptible one, and therefore it could not have been preempted. It could also have been a preemptible interrupt which is executing a critical portion of code that needs interrupts to be disabled, but the scenario is still the same, you cannot be inside another interrupt.

(Q2) By the way, in preemptive kernel, can ISR be preempted by a process?

No. It's improper to say "preempted by a process". Processes do not really preempt anything, they are preempted by the kernel which takes control. That said, a preemptible interrupt could in theory be interrupted by another one that was for example registered by a process (I don't know an example case for this scenario unfortunately). I still wouldn't call this "preempted by a process" though, since the whole thing keeps happening in kernel space.

(Q3) [...] Do interrupts also have the current thread info?

Interrupt handlers live in a different world, they do not care about running tasks and do not need access to such information. You probably could get ahold of current or even current_thread_info if you really wanted, but I doubt that'd be of any help for anything. An interrupt is not associated with any task, there's no link between the interrupt and a certain task running. Another answer here for reference.

Case 2: a previous normal process had acquired the lock with spin_lock_irq (or irqsave). But this also is weird because before locking, spin_lock_irq (or irqsave) disables preemption and interrupt for the task telling the scheduler not to switch to other task after the scheduler timer interrupt. So this case cannot be true.

Yes, you're right. That's not possible.

The spin_lock_irqsave() function exists to be used in circumstances in which you cannot know if interrupts have already been disabled or not, and therefore you cannot use spin_lock_irq() followed by spin_unlock_irq() because that second function would forcibly re-enable interrupts. By the way, this is also explained in Chapter 5 of Linux Device Drivers, which I linked above.

In the scenario you describe, you are calling spin_lock_irqsave() and interrupts have already been disabled by something else. This means that any of the parent caller functions that ended up calling the current function must have already disabled interrupts somehow.

The following scenarios are possible:

The original interrupt disable was caused by an interrupt handler and you are now executing another piece of code as part of the same interrupt handler (i.e. the current function has been called either directly or indirectly by the interrupt handler itself). You can very well have a call to spin_lock_irqsave() in a function that is being called by an interrupt handler. Or even just a call to local_irqsave() (the kfree() function does this for example, and it can surely be called from interrupt context).
The original interrupt disable was caused by normal kernel code and you are now executing another piece of code as part of the same normal kernel code (i.e. the current function has been called either directly or indirectly by some other kernel function after disabling interrupts). This is completely possible, and in fact it's the reason why the irqsave variant exists.

Confusion around spin_lock_irqsave: in what nested situation is interrupt state kept?

1 Answers