0
votes

I'm using an ST32F401RE (ARM Cortex -M4 32-bit RISC) and was curious about the following.

Normally instructions on a 32 bit ARM can be 2 byte or 4 byte long. I accidentally jumped in-between a 2 byte instruction and the Microprocessor instantly went into an infinite Error Handler loop afterwards.

I later tested this and jumped on purpose in-between a 4 byte and 2 byte instruction and the Microprocessor would always go into the Error Handler.

I used the following c code to jump into Memory Adresses.

void (*foo)(void) = (void (*)())0x80002e8;
foo( ) ;

The Adresses for functions and instructions are from the Disassembly. The Compiler used the following assembler instruction after storing the adress in r3.

blx     r3

Question: How exactly can the Microprocessor tell that it didn't start at the beginning of an instruction but actually started in-between one?
Especially in case of the 16 bit thumb instructions which are already pretty cramped.

I have multiple guesses but want to know what exactly is going on.

1
It's not clear from your question exactly how you "jumped". Have you looked up the relevant instruction in the ARMv7-M Architecture Reference Manual and read the description? - Michael
void (*foo)(void) = (void (*)())0x80002e8; foo (); this will always crash on a Cortex-M, as the address is even, indicating that the target instruction set is ARM (A32), which the Cortex-M don't support; but the blx instruction (x = exchange) requests the switch to ARM. You have to set the lowest bit in the address to indicate a jump to a Thumb instruction, or use the bl instruction (without exchange). - Erlkoenig
@Erlkoenig I also used the c goto when i tested this. It makes sense now, why the compiler added the orr.w r3, r3, #1 line. - negatic
Indirect go-to is a GCC extension anyways... Jumping to arbitrary addresses directly from C is always problematic. Better avoid it, or perhaps use (inline) assembly. - Erlkoenig
Maybe something like __asm__ volatile ("blx %[fun]" : : [fun]"r"(0x80002e9) : "memory"); so you can directly control how the call/jump works. The address can be any C expression that yields a 32bit value. - Erlkoenig

1 Answers

1
votes

Normally instructions on a 32 bit ARM can be 2 byte or 4 byte long.

Only for Thumb2; on Thumb they are all 2 bytes, and on ARM ("A32") mode they are all 4 bytes.

Question: How exactly can the Microprocessor tell that it didn't start at the beginning of an instruction but actually started in-between one?

It can't. If the 2 upper bytes of a 4-byte instruction happen to form a valid 2-byte instruction and you jump there, it will be executed as such. In your case, these upper 2 bytes probably were all invalid instructions, resulting in a fault exception.

For example, the program

.code 16
.syntax unified

test4byte:
    mov.w r0, #0x88000000
    
test2byte:
    ands r0, r1

will be assembled into

00000000 <test4byte>:
   0:   f04f 4008   mov.w   r0, #2281701376 ; 0x88000000

00000004 <test2byte>:
   4:   4008        ands    r0, r1

or as a byte-wise hex dump

4f f0 08 40 08 40

As you see, the sequence 08 40 occurs twice - both as the upper 2 bytes of the mov.w and as the ands instruction, both of which are identical. So, the processor has no way to tell these apart.

In a program that just contained the shown mov.w instruction, if you jumped to address 0, the mov.w would be executed; if you jumped to address 2, an ands would be executed, even though it doesn't appear in the assembly code.