Smart disassembly of TBB in ARM Thumb?

Question

In ARM, a TBB (table branch byte) is basically a switch instruction, it gets a jump address from a table after the instruction based on the index passed in (switch value), then jumps to that address. ARM docs

I am trying to automatically disassemble TBB (and TBH) tables, so they are not disassembled as instructions, but continue disassembly after the table. The trouble is, TBB tables are variable length, and do no bounds checking. Bounds checking must be done manually (or by a compiler) before the TBB.

The table has no terminator, and within the table any byte is a valid jump offset. Code begins again immediately after the table.

So, has anyone encountered (or could think of) a way to automatically determine the length of a TBB table? The best I have is to scour the instructions leading up to the TBB for the default case, but that seems like an inexact method.

A simple first check is to verify the values in the table point to program code. — Jester
@Jester That's generally true for any possible byte offset. But, you've given me an idea. There has to be at least one branch, and that offset would at least give an idea of where the table ends. Not guaranteed to be right after the table, though. — Chaos
@Jester Right, the table entries are single-byte PC-relative jump offsets. The bytes in real code, interpreted as offsets, are fairly likely to get you to something that you could interpret as instructions. Thumb is a dense instruction set. Conversely, 99 times out of 100, the branch tables could be valid instructions, too. I have no symbols, I just have to follow what I know is code. — Chaos
It is like any other disassembly, some stuff you cant figure out without simulating and even there you cant always get it. It is clear that some file formats (elf) leave clues as to what sections are or might be for example mixed arm and thumb code disassembled from elf is correct (other than read only constants which they attempt to disassemble). My guess is there isnt enough detail for what you are doing and even if you simulated all the code leading up to this table there is still some percentage chance of getting it wrong. — old_timer

Chaos Chaos · Accepted Answer · 2017-02-24T15:19:21

Here's a possible approach I'm trying, which seems to work, at least for some well-behaved branch tables. After decoding the TBB, I start looping over the branch bytes. For each one, I find the address it corresponds to, and keep track of the lowest of these addresses (closest to the end of the branch table). I also check that each branch address is after the currently decoded branch byte, since there may be zero padding at the end of the table.

This depends on there not being any code or data between the end of the table and the beginning of the code referenced from the table. If, for example, the default case were immediately after the table, but not referenced from the table, this would encounter problems. For the examples I have to test, the compiler placed default cases at the end of the other cases.

I'm using Capstone for disassembly, here is some code that should make sense without much context:

case ARM_INS_TBB:
// Table branch byte
if(insn->detail->arm.op_count == 1 &&
        insn->detail->arm.operands[0].type == ARM_OP_MEM &&
        insn->detail->arm.operands[0].mem.base == ARM_REG_PC){
    // PC relative TBB
    u64 min = U64_MAX;
    // loop over table bytes
    for(u64 i = 0; ; ++i){
        // check if current table byte is before minimum branch target
        if(insn_addr + insn->size + i < min){
            // get branch address from table byte
            u64 branchaddr = insn_addr + insn->size +
                (binary_image[insn_addr + insn->size + i] << 1);
            // check if branch address is larger than the 
            // location of the previous table byte
            if(branchaddr > insn_addr + insn->size + i){
                // new lower address branch target
                min = branchaddr;
                // do something with the code at branchaddr
            } else {
                break;
            }
        } else {
            break;
        }
    }
}
// Instructions immediately after this are junk, stop parsing
return;
break;

Smart disassembly of TBB in ARM Thumb?

1 Answers