2
votes

So... I'm compiling into assembler, with gcc -S -O2 -m32:

void h(int y){int x; x=y+1; f(y); f(2); }

And it gives me the following:

.file   "sample.c"
.text
.p2align 4,,15
.globl h
.type   h, @function
 h:
pushl   %ebp
movl    %esp, %ebp
subl    $24, %esp
movl    8(%ebp), %eax
movl    %eax, (%esp)
call    f
movl    $2, 8(%ebp)
leave
jmp f
.size   h, .-h
.ident  "GCC: (GNU) 4.4.3 20100127 (Red Hat 4.4.3-4)"
.section    .note.GNU-stack,"",@progbits

Now I know what pushl and movel: they store the current frame pointer onto the stack and then set the value of the frame pointer register to the value of the Stack Pointer.

  1. But I have no idea what the subl $24, %esp is. I understood that it moves the stack pointer down by 24 bytes. Correct?
  2. What is immed by the way?
  3. Why does movl 8(%ebp), %eax use 8? Is it 8 bytes? Is this to accommodate for return value + argument y to h? Or am I completely off here. So this means look back 8 bytes from the stack pointer?
  4. What does movl $2, 8(%ebp) do? It copies contant 2 to the location 8 bytes before the frame pointer. Did the frame pointer change when we called f? If yes - then 8(%ebp) points to the argument location for f.
  5. What does leave do? How can it "remove" a stack frame? I mean you cant just remove a piece of memory. In the doc it says it does mov(esp, ebp), pop ebp.

Thanks!

2
Interestingly, the answer below was marked as accepted although it doesn't actually give an explanation to question 1. Here is another question/answer that gives an explanation to 1.andreee

2 Answers

5
votes

To answer those numbered questions:

1) subl $24,%esp

means esp = esp - 24

GNU AS uses AT&T syntax, which is the opposite of Intel syntax. AT&T has the destination on the right, Intel has the destination on the left. Also AT&T is explicit about the size of the arguments. Intel tries to deduce it or forces you to be explicit.

The stack grows down in memory, the memory at and after esp is the stack contents, addresses lower than esp are unused stack space. esp points to the last thing pushed onto the stack.

2) x86 instruction encoding mostly allows the following:

movl rm,r   ' move value from register or memory to a register
movl r,rm   ' move a value from a register to a register or memory
movl imm,rm ' Move immediate value.

there is no memory-to-memory instruction format. (Strictly speaking you can do memory-to-memory operations with movs or by push mem, pop mem, but neither take two memory operands on the same instruction)

"Immediate" means the value is encoded right into the instruction. For example, to store 15 at the address in ebx:

movl $15,(%ebx)

15 is an "immediate" value.

The parentheses make it use the register as a pointer to memory.

3) movl 8(%ebp),%eax

means,

  • take the value of ebp
  • add 8 to it (does not modify ebp though),
  • use it as an address (the parentheses),
  • read the 32-bit value from that address,
  • and store the value in eax

esp is the stack pointer. In 32-bit mode, each push and pop on the stack is 4 bytes wide. Typically, most variables take up the 4 bytes anyway. So you could say 8(%ebp) means, starting at the top of stack, give me the value 2 (4 x 2 = 8) int's into the stack.

Typically, 32-bit code uses ebp to point to the beginning of the local variables in a function. In 16-bit x86 code, there was no way to use the stack pointer as a pointer (hard to believe, right?). So what people did was copy sp to bp and use bp as the local frame pointer. This became completely unnecessary when 32-bit mode came out (80386), it did have a way to just use the stack pointer directly. Unfortunately, ebp makes debugging easier so we ended up continuing to use ebp in 32-bit code (it's trivially easy to make a stack dump if ebp is being used).

Thankfully, amd64 gave us a new ABI which does not use ebp as a frame pointer, 64-bit code typically uses esp to access local variables, ebp is available to hold a variable.

4) Explained above

5) leave is an old instruction that simply does movl %ebp,%esp and popl %ebp and saves a few code bytes. What it actually does is undo the changes to the stack and restore the caller's ebp. The called function must preserve ebp in the x86 ABI.

On entry to the function, the compiler did subl $24,%esp to make room for local variables and sometimes temp storage that it didnt have enough registers to hold.

The best way to "imagine" the stack frame in your mind is to see it as a structure sitting on the stack. The first members of the imaginary structure are the most recently "pushed" values. So when you push to a stack, imagine inserting a new member at the beginning of the structure, while none of the other members moved. When you "pop" from the stack, you get the value of the first member of the imaginary struct, and that (first) line of the structure disappears from existence.

Stack frame manipulation is mostly just moving the stack pointer to make more or less room in that imaginary struct we call the stack frame. Subtracting from the stack pointer just puts multiple imaginary members at the start of the struct in one step. Adding to the stack pointer makes the first so many members disappear.

The end of the code you posted is not typical. That jmp is typically a ret. The compiler was clever about it and did a "tail call optimization", meaning it just cleans up what it did to the stack and jumps to f. When f(2) returns, it will actually return straight to the caller (not back to the code you posted)

4
votes

The compiler is reserving space on the stack for locals and whatever other needs it might have. I'm not sure offhand why it's reserving 24 bytes (it doesn't seem to need or use it all).

When calling function f(), instead of using a push instruction to put the parameter on the stack, it uses a simple movl to the last location it reserved:

movl    8(%ebp), %eax    ; get the value of `y` passed in to `h()`
movl    %eax, (%esp)     ; put that value on the stack for call to `f()`

A more interesting (in my opinion) thing happening here is how the compiler is handling the call to f(2):

movl    $2, 8(%ebp)      ; store 2 in the `y` argument passed to `h()`
                         ;     since `h()` won't be using `y` anymore
leave                    ; get rid of the stackframe for `h()`
jmp f                    ; jump to `f()` instead of calling it - it'll return
                         ;     directly to whatever called `h()`

To answer your question, "immed by the way?" - that is what the instruction reference uses to indicate that the value is encoded in the instruction opcode instead of coming somewhere else like a register or memory location.