Understanding executing a function at the machine level

Question

I'm taking a class in Language Based Security and I have to know step by step what is happening in the stack when a function is executed properly, so that later I can learn how to protect against exploits. So far I have a pretty good understanding of what is pushed and popped from the stack and how the ESP, EBP move about to keep track of frames. Also, I know that the EIP is saved on the stack.

What I don't know is where the code in the function is actually executed to get a result (I presume somewhere else in memory, the Heap?) If I give a walkthrough of a simple function, can someone explain the missing bits (I'll tag those parts with questions). Presume a simple function:

int add(int x, int y)
{
   int sum = x + y;
   return sum;
}

which is called in main() with add(3,4);

At the initialization of a new function, the stack (from lowest address to highest) has the ESP pointing to the top and the EBP pointing to the base of the new frame. Below that is main().

Now, the parameters are pushed onto the stack from right to left. The the function call saves the contents of EIP on the stack. [This is the address of the next instruction to be executed after the function returns?]

Now the Prolog part: The old EBP address is pushed onto stack and the EBP is made to point to the ESP. Finally, the local variables are pushed onto the stack [Are these just the addresses of where their values are stored?]

The Epilog is when the stack is to be unwound for the current frame. ESP is moved to EBP, so that local variables are inaccessible (normally). Old EBP is popped off stack, and made to point to its original address. ESP moves to point to saved EIP which was where it was before add(3,4) was called.

In the explanation I was given to study, the final part is that the return instruction pops the saved EIP value back into the EIP register. [Surely this isn't the return statement in the function but a ret instruction at machine level, right?]

Last question, can someone explain what's going on when the code in the function is executing and at what point during all that does the call, prolog and epilog occur? Or provide a good link to a clear explanation?

Thanks heaps in advance (so to speak :)

The way to guard against exploits is to prevent input which will overflow the space assigned to it. Function arguments and local variables like int i; or char *ptr cannot overflow the stack, but arrays such as char[20]; can, and arrays using memory dynamically allocated to char *ptr can overflow the allocation on the heap. — Weather Vane
Why don't you compile a program and step through a function using a debugger? You'll learn a lot more that way than anything we could tell you here. — JS1
When the x86 call instruction is executed, the address of the instruction following the call is pushed on the stack. Function local variables are usually not pushed on the stack - rather the stack pointer is decremented to allocate needed storage for locals, which are initialized later. And the return is indeed an assembly instruction. — Craig S. Anderson

tux3 tux3 · Accepted Answer · 2015-04-10T18:29:37

First, I compiled then disassembled your function so you could see what's actually going on at the ASM level. I disabled optimizations and compiled to 32bit code to keep things simple:

Dump of assembler code for function add:
   0x080483cb <+0>:     push   %ebp
   0x080483cc <+1>:     mov    %esp,%ebp
   0x080483ce <+3>:     sub    $0x10,%esp
   0x080483d1 <+6>:     mov    0x8(%ebp),%edx
   0x080483d4 <+9>:     mov    0xc(%ebp),%eax
   0x080483d7 <+12>:    add    %edx,%eax
   0x080483d9 <+14>:    mov    %eax,-0x4(%ebp)
   0x080483dc <+17>:    mov    -0x4(%ebp),%eax
   0x080483df <+20>:    leave  
   0x080483e0 <+21>:    ret    
End of assembler dump.

Try to look at the disassembly above and recognize what it's doing and how it matches your C code. Now to answer your questions.

Now the Prolog part: The old EBP address is pushed onto stack and the EBP is made to point to the ESP. Finally, the local variables are pushed onto the stack [Are these just the addresses of where their values are stored?]

Here the prolog goes from 0x080483cb <+0> to 0x080483ce <+3> included. First we create a frame with push %ebp; mov %esp,%ebp as you said, and then we allocate 0x10 bytes of space for local variables on the stack with sub $0x10,%esp. All that this instruction does is move the stack pointer 0x10 bytes down. It doesn't store any values, it just leaves some space there that we can use for local variables if we want to (and we'll see that the compiler doesn't even uses all of it!).

Next we have the actual logic of the function. First we load the two arguments x and y from the stack into registers:

0x080483d1 <+6>:     mov    0x8(%ebp),%edx
0x080483d4 <+9>:     mov    0xc(%ebp),%eax

We add them together:

0x080483d7 <+12>:    add    %edx,%eax

Now we store the result in a local variable. That local variable is really just the space on the stack we allocated in the prolog. We allocated 0x10 bytes for local variables, and here we only use the first 4 bytes to store the result of the addition:

0x080483d9 <+14>:    mov    %eax,-0x4(%ebp)

And because there aren't any optimizations, we immediately load that result right from the local variable back to a register so that we can return it:

0x080483dc <+17>:    mov    -0x4(%ebp),%eax

As you can see the code is incredibly inefficient, but at least it's fairly easy to read. Now only the epilog is left, it's pretty simple :

0x080483df <+20>:    leave  
0x080483e0 <+21>:    ret

The leave destroys the frame we create in the prolog, and the ret returns to the next instruction of the calling function.

The Epilog is when the stack is to be unwound for the current frame. ESP is moved to EBP, so that local variables are inaccessible (normally). Old EBP is popped off stack, and made to point to its original address. ESP moves to point to saved EIP which was where it was before add(3,4) was called.

In the explanation I was given to study, the final part is that the return instruction pops the saved EIP value back into the EIP register. [Surely this isn't the return statement in the function but a ret instruction at machine level, right?]

The return statement in the function corresponds to a ret instruction at the machine level. It's the direct translation. Remember that your computer doesn't run C code directly, all that C is compiled to machine code first, and the ret instruction here is indeed what pops EIP.

Last question, can someone explain what's going on when the code in the function is executing and at what point during all that does the call, prolog and epilog occur? Or provide a good link to a clear explanation?

The disassembly that you see above is a rough text representation of what the computer runs. EIP contains the address of the next instruction that the computer will run. When your program is running, it's stored somewhere in memory and EIP is pointing directly at the instructions in memory.

So the computer will just run the function in the order it is written, and the prolog and epilog are part of the function.

The prolog and epilog are a convention, but they are just code. You could completely remove the prolog and write a crazy epilog if you wanted, it would work too.

I'd recommend that you go play with disassemblers and debuggers, to familiarize yourself with how it actually works. It's not that hard and very logical.

Understanding executing a function at the machine level

1 Answers