Why is the IRQ latency in my ARM interrupt handler always the same, regardless of the instruction that is being interrupted?

Question

I am trying to apply a type of side channel attack I read about in this paper that tries to infer execution state from differences in IRQ latencies on a MCU with a cortex M4 processor. The attack carefully interrupts instructions that occur right after a branch and measures the interrupt latency. When different branches have instructions of different lengths, you can look at the interrupt latency to determine in which of these branches the interrupt occurred and leak some of the program state.

I wrote a simple function that I want to attack in the way described above. I am using the SysTick timer to generate the interrupt at the correct point in time. To get an initial good value for the interrupt timer I used GDB to stop the program at the target line to see the SysTick value at that time.

I implemented a very simple interrupt handler that

loads the SysTick timer value from memory
subtracts this value from the reload value to get the elapsed time since interrupt (i.e. the IRQ latency)
clears the interrupt and

void __attribute__((interrupt("IRQ"))) SysTick_Handler(void)
{
  /* USER CODE BEGIN SysTick_IRQn 0 */
    SysTick->CTRL &= 0xfffffffe;                                // disable SysTick (~SysTick_CTRL_ENABLE_Msk)
    *timer_value = SysTick->VAL;                                // capture counter value (as quickly as possible)
    *timer_value = SysTick->LOAD - *timer_value;                    // subtract it from reload value to get IRQ latency
    SysTick->VAL = 0;                                           // reset initial value
}

However I find that I always get the same IRQ latency, regardless of the instruction that was interrupted. I expect the interrupt latency to be longer when a longer instruction is interrupted.

This is the function I wrote to test the attack

extern uint32_t *timer_value;
int sample_function(int *a, int *b){
    /*
     * function description -- store the smallest of the two value in a, if MEASURE_CYCLESS defined return the number
     * of clock cycles that have been elapsed since the timer has been started
     * r0 contains pointer to a
     * r1 contains pointer to b
     */

    __asm volatile(
        /*  push working registers */
        "PUSH {r4-r8} \n"
        /* move counter into r8 */
        "MOV r8, #10 \n"
        /* begin loop */
        "begin_loop: \n"
        /* decrement counter variable*/
        "SUB r8, r8, #1 \n"
        /* if counter variable not equal to 0, jump back to start of loop */
        "CMP r8, #0 \n"
        /* if r8 not equal to 0, jump back to begin of loop*/
        "BNE begin_loop \n"
        /* load a into r2 */
        "LDR r2, [r0] \n"
        /* load b into r3 */
        "LDR r3, [r1] \n"
        /*  store a-b in r4, setting status flags -- if result is 0 Z flag is set */
        "SUBS r4, r2, r3 \n"
        /* if a-b positive, a is larger  otherwise, b is larger (assuming a not equal to b)  */
        "BPL a_larger \n"
#ifdef SPY
        /* load address of (*timer_value) into r4 -- use of LDR pseudo-instruction places constant in a literal pool*/
        "LDR r4, =timer_value \n"
        /* Load (*timer_value) into r4 */
        "LDR r4, [r4] \n"
        /* load address of Systick VAL into r5 */
        "LDR r5, =0xe000e018 \n"
        /* Load value at address stored in R5 (= Systick Val) */
        "LDR r5, [r5] \n"
        /* Move Systick Val into adress stored at r4 (= *timer_value = address of timer_value)*/
        "STR r5, [r4] \n"
#endif
        "NOP \n"
        /*instruction that gets interrupted -- swap value*/
        "STR r2, [r1] \n"
        /* load value at this address into r0 (return value) */
        "STR r3, [r0] \n"
        "B end \n"
        "a_larger: \n"
        "MOV r0, #0 \n"              // instruction that gets interrupted
        "end: POP    {r4-r8}"
            );     // pop working registers
}

Note, the section of code in the #define block is used to automatically determine a good timer reload value (instead of using GDB), but I'm currently not using the value I obtained this way. I also have an empty loop in there to delay the instruction that is meant to be interrupted a bit.

The instruction that gets interrupted is the instruction right after the #define block. When I remove the NOP instruction I still get the same interrupt latency. When I increase or decrease the timer value (to interrupt some cycles earlier or later) I also still get the same IRQ latency.

Am I missing something here? Is there some behavior I do not know about? Also, is it important to use the attribute __attribute__((interrupt("IRQ")) for an interrupt handler?

I cant imagine how GDB could get an accurate timer read. Looks like a ton of code to do whatever you are trying to do. Do the test in assembly language, both the foreground as well as the interrupt, that way you dont have to push any registers, you can simply have a few line isr, ldr rd,[rn]; bx lr where the ldr is reading the systick timer. everything for the test setup ahead of time. — old_timer
what is your definition of long instruction. the foreground task can be a loop of things like nops and then things like stores and things like reads, and such. — old_timer
there will no question be latency differences as the bus(ses) can be loaded differently, affecting the fetch time of the isr, or execution of the isr, etc. (well for systick it is an exception handler not an interrupt handler). but can you get to where you can see it though. — old_timer
there is no point in trying to set the timeout value to try to target an instruction if that was implied here, just pick a periodic interrupt period. the foreground task can say examine a register, maybe you increment in the handler, then branch off and print out the one or two items collected in the handler. then go back to a loop that is mostly the instruction(s) you hope to interrupt. so the period needs to be long enough to allow that to happen. once a second for example. — old_timer
you definitely dont want to build for debug and cant see how a debugger helps here, but can see how it gets in the way. — old_timer

old_timer old_timer · Accepted Answer · 2020-12-13T16:41:52

This is what I was thinking and commenting on.

bootstrap

.thumb_func
reset:
    bl notmain
    ldr r4,=0xE000E018
    ldr r0,=0xE000E010
    mov r1,#7
    str r1,[r0]
    b hang
.thumb_func
hang:   
    nop
    nop
    nop
    nop
    nop
    nop
    nop
    b hang

setup uart and systick

void notmain ( void )
{
    uart_init();
    hexstring(0x12345678);
    
    PUT32(STK_CSR,4);
    PUT32(STK_RVR,0xF40000);
    PUT32(STK_CVR,0x00000000);
    //PUT32(STK_CSR,7);
}

event handler

.thumb_func
.globl systick_handler
systick_handler:
    ldr r0,[r4]
    ldr r5,[sp,#0x18]
    push {r0,lr}
    bl hexstrings
    mov r0,r5
    bl hexstring
    pop {r0,pc}

grab the timer and address of interrupted instruction and print them out.

00F3FFF4 08000054 
00F3FFF4 08000056 
00F3FFF4 08000058 
00F3FFF4 0800005A 
00F3FFF4 0800005C 
00F3FFF4 0800005E 
00F3FFF4 08000054 
00F3FFF4 08000056 
00F3FFF4 08000058 
00F3FFF4 0800005A 
00F3FFF4 08000050 


08000050 <hang>:
 8000050:   bf00        nop
 8000052:   bf00        nop
 8000054:   bf00        nop
 8000056:   bf00        nop
 8000058:   bf00        nop
 800005a:   bf00        nop
 800005c:   bf00        nop
 800005e:   e7f7        b.n 8000050 <hang>

From ARM's documentation.

Interrupt Latency

There is a maximum of a twelve cycle latency from asserting the interrupt to execution of the first instruction of the ISR when the memory being accessed has no wait states being applied. When the FPU option is implemented and a floating point context is active and the lazy stacking is not enabled, this maximum latency is increased to twenty nine cycles. The first instructions to be executed are fetched in parallel to the stack push.

And that last line we can perhaps see happening here. You can try various instructions, but this architecture has the ability to restart the long duration instructions (reads and push/pop, multiply, and such). I think to see much of a latency difference you may need to create bus or shared resource contention (vs instructions)

Also systick is an exception not an interrupt, so there may be some differences with respect to latency.

Why is the IRQ latency in my ARM interrupt handler always the same, regardless of the instruction that is being interrupted?

1 Answers