7
votes

While doing SMP porting of some of our drivers (on powerpc target) we observed some behavior on which I need you guys to shed some light:

  1. On doing a local_irq_disable() on a UP system the jiffies tend to freeze i.e. the count stops incrementing. Is this expected? I thought that the decrementer interrupt is 'internal' and should not get affected by the local_irq_disable() kind off call since I expected it to disable local IRQ interrupt processing (external interrupt). The system of course freezes then also upon doing a local_irq_enable() the jiffies count jumps and it seems to be compensating for the 'time lapse' between the local_irq_disable() and enable() call.

  2. Doing the same on an SMP system (P2020 with 2 e500 cores) the results are surprising. Firstly the module that is being inserted to do this testing always executes on core 1. Further it sometimes does not see a freeze of 'jiffies' counter and sometimes we see that it indeed freezes. Again in case of a freeze of count it tends to jump after doing a local_irq_enable(). I have no idea why this may be happening.
    Do we know in case of an SMP do both cores run a schedule timer, so that in some cases we do not see a freeze of jiffies counts or is it just on core 0 ?

Also since the kernel timers rely on 'jiffies' -- this would mean that none of our kernel timers will fire if local_irq_disable() has been done? What would be the case this is done on one of the cores in an SMP system?

There are many other questions, but I guess these will be enough to begin on a general discussion about the same :)

TIA

NS

Some more comments from the experimentation done.

My understanding at this point in time is that since kernel timers depend on 'jiffies' to fire, they wont actually fire on a UP system when I issue a local_irq_save(). Infact some of our code is based on the assumption that when I do issue a local_irq_save() it guarantees protection against interrupts on the local processor and kernel timers as well.

However carrying out the same experiment on an SMP system, even with both cores executing a local_irq_save(), the jiffies do NOT stop incrementing and the system doesn't freeze. How is this possible ? Is LINUX using some other mechanism to trigger timer interrupts in the SMP system or possibly using IPIs? This also breaks our assumption that local_irq_disable() will protect the system against kernel timers running on the same core atleast.

How do we go about writing a code that is safe against async events i.e. interrupts and kernel timers and is valid for both UP and SMP.

2

2 Answers

4
votes

local_irq_disable only disables interrupts on the current core, so, when you're single core, everything is disabled (including timer interrupts) and that is why jiffies are not updated. When running on SMP, sometimes you happen to disable the interrupts on the core that's updating the jiffies, sometimes not. This usually is not a problem, because interrupts are supposed to be disabled only for a very short periods, and all scheduled timers will fire after interrupts gets enabled again.

How do you know that your module always run on core 1? On current versions of the kernel, it may even be running on more than one core at the same time (that is, if you didn't forced it to don't do it).

4
votes

There are several facets to this problem. Lets take them 1 by 1.

1.

a)

local_irq_save() simply clears the IF flag of the eflags register. IRQ handlers can run concurently on the other cores.

global_irq_save() is not available because that would required interprocessor communication to implement and it is not really needed anyway since local irq disabling is intended for very short period of time only.

b)

modern APICs allows IRQ dynamic distribution among the present cores and besides rare exceptions, the kernel essentially programs the necessary registers to obtain a round-robin distribution of the IRQs.

The consequence of that is that if the irqs are disabled long enough locally, when the APIC delivers an IRQ to the core that has them disabled, the end result will be that the system will globally stop receiving this particular IRQ up to the point where the irqs are finally reenabled locally on the core that received the last IRQ of that type.

2.

Concerning the different results concerning jiffies updates and irq disabling, it depends on the selected clocksource.

You can figure out which one is choosen by consulting:

$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource

if you have tsc as clocksource then all cores have it locally. However if your clocksource is something else ie: HPET an external device, then jiffies will become frozen for the reasons described in point #1.