I'm writing threading code for a Cortex M4F. Everything's working and I'm now looking into making FPU context switching more efficient via lazy stacking.
I've read ARM's AN298 and I implemented the alternative approach based on disabling FPU and handling UsageFault, but the lower (S0-S15
) registers are not being saved/restored correctly by the hardware. I think the problems lies in figure 11:
According to this, when PendSV runs FPCAR
should point to the space reserved in Task A's stack. But as I see it, since CONTROL.FPCA
is high in Task C, FPCAR
will be updated to point to Task C's stack when entering PendSV. If so, S0-S15
and FPSCR
will be saved to Task C's stack instead of Task A's, which is of course not correct.
Am I missing something here, or is the appnote wrong?
One a side note, I checked some open source RTOSes. FreeRTOS and mbed RTOS always stack S16-S31
during the context switch, resulting in automatic S0-S15
stacking, i.e. they make use of lazy stacking only to reduce interrupt latency but do full state preservation for tasks (as in the first approach outlined in the appnote). The TNKernel port for M4F uses the UsageFault approach, but fully saves/restores S0-S31
via software, effectively bypassing any problem with FPCAR
(at the cost of 48 load/stores instead of 32, the 16 hardware ones get overwritten on restore). Nobody seems to be using the UsageFault approach while only preserving S16-S31
.
(By the way, this is also posted at ARM Community, but a lot of questions seem to go unanswered there. If I get an answer there, I'll replicate it here, too)