1
votes

I would like to divide a stack to stack-frames by looking on the raw data on the stack. I thought to do so by finding a "linked list" of saved EBP pointers.

  1. Can I assume that a (standard and commonly used) C compiler (e.g. gcc) will always update and save EBP on a function call in the function prologue?

    pushl %ebp
    movl %esp, %ebp

    Or are there cases where some compilers might skip that part for functions that don't get any parameters and don't have local variables?

    The x86 calling conventions and the Wiki article on function prologue don't help much with that.

  2. Is there any better method to divide a stack to stack frames just by looking on its raw data?

Thanks!

2
gcc has the option -fomit-frame-pointer; also, lazy people use a debugger instead of meditating over the raw data...Christoph
I wanted to do so programmatically, so using a debugger is not what I meant.Inusable Lumière

2 Answers

3
votes

Some versions of gcc have a -fomit-frame-pointer optimization option. If memory serves, it can be used even with parameters/local variables (they index directly off of ESP instead of using EBP). Unless I'm badly mistaken, MS VC++ can do roughly the same.

Offhand, I'm not sure of a way that's anywhere close to universally applicable. If you have code with debug info, it's usually pretty easy -- otherwise though...

2
votes

Even with the framepointer optimized out, stackframes are often distinguishable by looking through stack memory for saved return addresses instead. Remember that a function call sequence in x86 always consists of:

    call someFunc             ; pushes return address (instr. following `call`)
    ...
someFunc:
    push EBP                  ; if framepointer is used
    mov EBP, ESP              ; if framepointer is used
    push <nonvolatile regs>
    ...

so your stack will always - even if the framepointers are missing - have return addresses in there.

How do you recognize a return address ?

  • to start with, on x86, instruction have different lengths. That means return addresses - unlike other pointers (!) - tend to be misaligned values. Statistically 3/4 of them end not at a multiple of four.
    Any misaligned pointer is a good candidate for a return address.
  • then, remember that call instructions on x86 have specific opcode formats; read a few bytes before the return address and check if you find a call opcode there (99% most of the time, it's five bytes back for a direct call, and three bytes back for a call through a register). If so, you've found a return address.
    This is also a way to distinguish C++ vtables from return addresses by the way - vtable entrypoints you'll find on the stack, but looking "back" from those addresses you don't find call instructions.

With that method, you can get candidates for the call sequence out of the stack even without having symbols, framesize debugging information or anything.

The details of how to piece the actual call sequence together from those candidates are less straightforward though, you need a disassembler and some heuristics to trace potential call flows from the lowest-found return address all the way up to the last known program location. Maybe one day I'll blog about it ;-) though at this point I'd rather say that the margin of a stackoverflow posting is too small to contain this ...