3
votes

I'm writing an ARMv7 disassembler. The ways to switch between ARM and Thumb mode are clearly described in the ARM reference manual, but how do you know what mode a program starts in?

I am using Xcode which compiles to thumb by default, so I know that all of my own programs will start in Thumb unless I force compilation to ARM mode. But, I would like to be able to take an arbitrary mach-o executable and find out the instruction set mode at the beginning of the code.

Is there somewhere in the mach-o header that specifies the instruction set upon entry point?

3
It might depend on whether the start address is even or odd.Ross Ridge
@RossRidge you mean halfword-aligned? mine starts on a word-aligned addressMagg G.
I mean even or odd. The BX/BXL instructions switch to ARM and Thumb mode depending on whether the address is even or odd.Ross Ridge
@RossRidge But.. the addresses are always even?Magg G.
@MaggG. That's the subtlety - it's precisely because the instruction addresses are always even, that the bottom bit of a branch target address is free to be repurposed to indicate the target instruction set.Notlikethat

3 Answers

8
votes

The processor knows that it's in thumb mode by turning on the least-significant bit of the program counter, causing the program counter to have an odd value. This bit is ignored for the purpose of instruction fetching and you can switch between ARM and thumb mode by toggling this bit.

When you create an ARM binary, the linker will set set the least significant bit of the address of a symbol depending on whether this symbol points to ARM or thumb code so the processor automatically picks the right mode on program start. You don't need to care about this.

2
votes

Most operating systems insert a bit of code before your application's entry point, the C Runtime support. They will launch your app in whatever mode that code is written. That code will then mode change as necessary when calling into your main() or other entry point.

In the case of iOS, which is what I assume you're targeting since you're using Xcode, that code is in /usr/local/lib/crt0.o in your iOS SDK directory. Disassembling it shows that the symbol start is ARM code. That is, iOS apps always start running in ARM mode, but they can change mode very early thereafter.

1
votes

depends on what you mean by the entry point. and the answer is in that definition. An operating system will have to have a definition because it has to be in the right mode. so either the operating system will always define arm mode for example and then the code can switch if it wants. Or if you use a file format like elf with an entry point then you MIGHT get away with an even address being arm and an odd address being thumb, matching the bx/blx instruction.

if you are talking one of the cores, then an armv7m will always start and have to remain in thumb mode. armv7a and r will start in arm mode (reset, others are defined in the arm docs, likely arm mode) and then the code can switch.

if you are just trying to disassemble some generic object file then you might not be able to figure it out. visually as a human looking at an arm binary in hex when you see a lot of 0xE's (start of every word) that is likely arm code, 0x6 or 0x7 and not a lot of 0xEs or none (every halfword) then that is probably thumb code. but that is not something you can rely on for this task since the first few instructions is likely going to switch modes if there is a switch going to happen.

also if an elf file you might be able to tell from the block headers, I think that is how the gnu tools figure it out as they certainly dont detect it on the fly. so that is most likely how you want to do this, examine the elf file. if this is a raw binary, just instructions and data...good luck...