How does the Microprocessor detect if it's in-between an instruction?

Question

I'm using an ST32F401RE (ARM Cortex -M4 32-bit RISC) and was curious about the following.

Normally instructions on a 32 bit ARM can be 2 byte or 4 byte long. I accidentally jumped in-between a 2 byte instruction and the Microprocessor instantly went into an infinite Error Handler loop afterwards.

I later tested this and jumped on purpose in-between a 4 byte and 2 byte instruction and the Microprocessor would always go into the Error Handler.

I used the following c code to jump into Memory Adresses.

void (*foo)(void) = (void (*)())0x80002e8;
foo( ) ;

The Adresses for functions and instructions are from the Disassembly. The Compiler used the following assembler instruction after storing the adress in r3.

blx     r3

Question: How exactly can the Microprocessor tell that it didn't start at the beginning of an instruction but actually started in-between one?
Especially in case of the 16 bit thumb instructions which are already pretty cramped.

I have multiple guesses but want to know what exactly is going on.

It's not clear from your question exactly how you "jumped". Have you looked up the relevant instruction in the ARMv7-M Architecture Reference Manual and read the description? — Michael
void (*foo)(void) = (void (*)())0x80002e8; foo (); this will always crash on a Cortex-M, as the address is even, indicating that the target instruction set is ARM (A32), which the Cortex-M don't support; but the blx instruction (x = exchange) requests the switch to ARM. You have to set the lowest bit in the address to indicate a jump to a Thumb instruction, or use the bl instruction (without exchange). — Erlkoenig
@Erlkoenig I also used the c goto when i tested this. It makes sense now, why the compiler added the orr.w r3, r3, #1 line. — negatic
Indirect go-to is a GCC extension anyways... Jumping to arbitrary addresses directly from C is always problematic. Better avoid it, or perhaps use (inline) assembly. — Erlkoenig
Maybe something like __asm__ volatile ("blx %[fun]" : : [fun]"r"(0x80002e9) : "memory"); so you can directly control how the call/jump works. The address can be any C expression that yields a 32bit value. — Erlkoenig

Erlkoenig Erlkoenig · Accepted Answer · 2020-07-24T07:25:21

Normally instructions on a 32 bit ARM can be 2 byte or 4 byte long.

Only for Thumb2; on Thumb they are all 2 bytes, and on ARM ("A32") mode they are all 4 bytes.

Question: How exactly can the Microprocessor tell that it didn't start at the beginning of an instruction but actually started in-between one?

It can't. If the 2 upper bytes of a 4-byte instruction happen to form a valid 2-byte instruction and you jump there, it will be executed as such. In your case, these upper 2 bytes probably were all invalid instructions, resulting in a fault exception.

For example, the program

.code 16
.syntax unified

test4byte:
    mov.w r0, #0x88000000
    
test2byte:
    ands r0, r1

will be assembled into

00000000 <test4byte>:
   0:   f04f 4008   mov.w   r0, #2281701376 ; 0x88000000

00000004 <test2byte>:
   4:   4008        ands    r0, r1

or as a byte-wise hex dump

4f f0 08 40 08 40

As you see, the sequence 08 40 occurs twice - both as the upper 2 bytes of the mov.w and as the ands instruction, both of which are identical. So, the processor has no way to tell these apart.

In a program that just contained the shown mov.w instruction, if you jumped to address 0, the mov.w would be executed; if you jumped to address 2, an ands would be executed, even though it doesn't appear in the assembly code.

How does the Microprocessor detect if it's in-between an instruction?

1 Answers