2
votes

So, supposing I have access to the registers of a program. I have access to esp, ebp and eip. eip is pointing to the next instruction that needs to be executed, ebp is pointing to another frame pointer and esp is pointing to the top of the stack. I understand this, however I don't understand the rest of the stack or how to parse it.

For example, if I would like to get the local variables of a frame, should I just subtract ebp - esp (knowing that ebp is a bigger address than esp) and then go through those addresses and dereference them? Is this the proper way to get the local variables from that particular frame?

Another question, what would be the best way to figure out which function is related to each frame? If I subtract 1 to the ebp address and then dereference that value, should I be getting the return address "0x804..."? What is the relationship between this address and the function? For example, if Foo() has a high pc address of 0x8045555 and a low pc address of 0x8045550, is the return address that I would be getting going to be in between these addresses?

Thanks a lot in advanced and let me know if I wasn't clear enough..

NOTE: If someone has a better title suggest it, I didn't find a better one.

2

2 Answers

1
votes

The details for this depend on your CPU instruction set architecture (you're apparently using 32-bit x86) and your compiler toolchain (which I can't guess). Generally, you don't want to re-write the code to walk stack frames yourself, because it's complicated and fragile, and depends on your compiler's optimization and debugging settings.

If you're trying to debug a program, you should start by letting the debugger for your platform try to sort out your stack. For example, using gdb, you can run bt to get a "back-trace".

If you're trying to do this from inside the program in question, and you're using the GNU C library, then you can use the backtrace(3) function.

If you just want to understand how things really work, here's a helpful blog post: http://eli.thegreenplace.net/2011/02/04/where-the-top-of-the-stack-is-on-x86/

For deeper understanding, try Wikipedia's x86 Calling Conventions article. To go still deeper, if you're using an ELF-based architecture like Linux, see the ELF ABI specifications..

1
votes

The data that is in the registers and on the stack is just a series of bytes. The data is structured by the various instructions and applications that put the bytes into the series however the information about the structure of the bytes is not part of the series of bytes but rather additional information that may or may not be available.

For instance when source code is compiled and the binary code is produced by the compiler there will be additional descriptive information included with the binary code. The amount of additional descriptive information will depend on the compiler options chosen and the capabilities of the various other tools in the tool chain used.

For instance if the options are to create a debug build with the additional information such as function names and the tool chain supports the display of the binary code using the additional information then you can have a quite good view of variables, function names, and stepping through the code at the source code level.

On the other hand if the options are to create an optimized build with no additional information then even though the tool chain may support the display of the binary code using additional information to present the binary code in a more human readable view, because the information is not there, the tool chain can not display the binary code at the source code level showing lines of sources, variables, function names, etc.

So in order to accomplish what you are wanting to do, you will need to have the additional information so that you can combine the binary code with the additional information. Without the additional information all you have is a bunch of bytes in memory. You can display these bytes in various ways such as to interpret them as assembler code or to interpret them as text strings. However without the additional information, you are just making guesses, however educated.

Different compilers and tool chains on different platforms and operating systems will generate different types of additional information. So you will need to have information about the specific information provided by your tool chain.