5
votes

Here how I understand the story:

  • PC register holds pointer to next instruction
  • LDR instruction is loading the value of second operand into first operand (for example)
    LDR r0, [pc, 0x5678]
    is equivalent to this "C code"
    r0 = *(pc + 0x5678)
    
    It's pointer dereferencing with base offset.

And my question:

I found this code

LDR PC, [PC,-4]

It's commented like monkey patching, etc..

How I understand this code

pc = *(pc - 4)

I this case "pc" register will dereference the address of previous instruction and will contain the "machine code" of instruction (not the address of instruction), and program will jump to that invalid address to continue execution, and probably we will get "Segmentation Fault". So what I'm missing or not understanding?



The thing that makes me to think is the brackets of second operand in LDR instruction. As I know on x86 architecture brackets are already dereferencing the pointer, but I can't understand the meaning in ARM architecture.

mov r1, 0x5678
add r1, pc
mov r0, [r1]

is this code equivalent to?

LDR r0, [pc, 0x5678]
2
re the edit: mov cannot take a memory operand (ARM is a load-store architecture), so that code is invalid as is - if the third instruction was ldr r0, [r1] it would be equivalent. ldr r0, [pc, 0x5678] can't be encoded as a single instruction as the immediate is too big (i.e. it can't be represented by an 8-bit value rotated by an even number of bits).Notlikethat
Notlikethat thanks, it was my probleml0gg3r

2 Answers

10
votes

Quoting from section 4.9.4 of the ARM Instruction Set document (ARM DDI 0029E):

When using R15 as the base register you must remember it contains an address 8 bytes on from the address of the current instruction.

So that instruction will load the word located 4 bytes after the current instruction, which hopefully contains a valid address.

4
votes

Thanks to a quirk of the ARM architecture, LDR PC, [PC,-4] is a branch to the following instruction (assuming we're talking ARM, not Thumb here), thus under normal circumstances it has no effect (other than performance). The point is, by putting that instruction at the start of a function it's then really simple for the code to patch itself at runtime by rewriting the bottom 12 bits of the LDR instruction to change the offset, thus redirecting that function somewhere else. branching to an address stored in memory in the word immediately following the instruction. Herp derp, I got ADR and LDR confused there - the above would be true if it were ADR, but this case is even more straightforward.

Now that I've unconfused myself it's just a simple function call trampoline. The function address will be stored as a data word immediately following the LDR instruction (presumably set to some initial value by the linker) and can simply be rewritten as data at runtime to redirect the branch, without needing to resort to self-modifying code.