3
votes

I'm in the process of writing a compiler purely as a learning experience. I'm currently learning about stack frames by compiling simple c++ code and then studying the output asm produced by gcc 4.9.2 for Windows x86.

my simple c++ code is

#include <iostream>

using namespace std;

int globalVar;

void testStackStuff(void);
void testPassingOneInt32(int v);
void forceStackFrameCreation(int v);

int main()
{
  globalVar = 0;

  testStackStuff();

  std::cout << globalVar << std::endl;
}

void testStackStuff(void)
{
  testPassingOneInt32(666);
}

void testPassingOneInt32(int v)
{
  globalVar = globalVar + v;

  forceStackFrameCreation(v);
}

void forceStackFrameCreation(int v)
{
  globalVar = globalVar + v;
}

Ok, when this is compiled with -mpreferred-stack-boundary=4 I was expecting to see a stack aligned to 16 bytes (technically it is aligned to 16 bytes but with an extra 16 bytes of unused stack space). The prologue for main as produced by gcc is:

22                      .loc 1 12 0
23                      .cfi_startproc
24 0000 8D4C2404        lea ecx, [esp+4]
25                      .cfi_def_cfa 1, 0
26 0004 83E4F0          and esp, -16
27 0007 FF71FC          push    DWORD PTR [ecx-4]
28 000a 55              push    ebp
29                      .cfi_escape 0x10,0x5,0x2,0x75,0
30 000b 89E5            mov ebp, esp
31 000d 51              push    ecx
32                      .cfi_escape 0xf,0x3,0x75,0x7c,0x6
33 000e 83EC14          sub esp, 20
34                      .loc 1 12 0
35 0011 E8000000        call    ___main
35      00
36                      .loc 1 13 0
37 0016 C7050000        mov DWORD PTR _globalVar, 0
38                      .loc 1 15 0
39 0020 E8330000        call    __Z14testStackStuffv

line 26 rounds esp down to the nearest 16 byte boundary.

lines 27, 28 and 31 push a total of 12 bytes onto the stack, then

line 33 subtracts another 20 bytes from esp, giving a total of 32 bytes!

Why?

line 39 then calls testStackStuff.

NOTE - this call pushes the return address (4 bytes).

Now, lets look at the prologue for testStackStuff, keeping in mind that the stack is now 4 bytes closer to the next 16 byte boundary.

67 0058 55              push    ebp
68                      .cfi_def_cfa_offset 8
69                      .cfi_offset 5, -8
70 0059 89E5            mov ebp, esp
71                      .cfi_def_cfa_register 5
72 005b 83EC18          sub esp, 24
73                      .loc 1 22 0
74 005e C704249A        mov DWORD PTR [esp], 666

line 67 pushes another 4 bytes (now 8 bytes towards the boundary).

line 72 subtracts another 24 bytes (total 32 bytes).

At this point the stack is now aligned correctly on a 16 byte boundary. But why the multiple of 2?

If I change the compiler flags to -mpreferred-stack-boundary=5 I would expect a stack aligned to 32 bytes, but again gcc seems to produce stack frames aligned to 64 bytes, twice the amount I was expecting.

Prologue for main

23                      .cfi_startproc
24 0000 8D4C2404        lea ecx, [esp+4]
25                      .cfi_def_cfa 1, 0
26 0004 83E4E0          and esp, -32
27 0007 FF71FC          push    DWORD PTR [ecx-4]
28 000a 55              push    ebp
29                      .cfi_escape 0x10,0x5,0x2,0x75,0
30 000b 89E5            mov ebp, esp
31 000d 51              push    ecx
32                      .cfi_escape 0xf,0x3,0x75,0x7c,0x6
33 000e 83EC34          sub esp, 52
34                      .loc 1 12 0
35 0011 E8000000        call    ___main
35      00
36                      .loc 1 13 0
37 0016 C7050000        mov DWORD PTR _globalVar, 0
37      00000000 
37      0000
38                      .loc 1 15 0
39 0020 E8330000        call    __Z14testStackStuffv

line 26 rounds esp down to the nearest 32 byte boundary

lines 27, 28 and 31 push a total of 12 bytes onto the stack, then

line 33 subtracts another 52 bytes from esp, giving a total of 64 bytes!

and the prologue for testStackStuff is

66                      .cfi_startproc
67 0058 55              push    ebp
68                      .cfi_def_cfa_offset 8
69                      .cfi_offset 5, -8
70 0059 89E5            mov ebp, esp
71                      .cfi_def_cfa_register 5
72 005b 83EC38          sub esp, 56
73                      .loc 1 22 0

(4 bytes on stack from) call __Z14testStackStuffv

(4 bytes on stack from) push ebp

(56 bytes on stack from) sub esp,56

total 64 bytes.

Does anybody know why gcc is creating this extra stack space or have I overlooked something obvious?

Thanks for any help you can offer.

2
but again gcc seems to produce stack frames aligned to 64 bytes. No, it used and esp, -32. The stack frame size looks like 64 bytes, but its alignment is only 32B.Peter Cordes
related: stackoverflow.com/questions/38781118/… explains the push DWORD PTR [ecx-4] part.Peter Cordes

2 Answers

2
votes

In order to resolve this mystery, you will need to look at the documentation of gcc to find out exactly which flavor of Application Binary Interface (ABI) it uses, and then go find the specification of that ABI and read it. If you are "in the process of writing a compiler purely as a learning experience" you will definitely need it.

In short, and in broad terms, what is happening is that the ABI mandates that this extra space be reserved by the current function, for the purpose of passing parameters to functions invoked by the current function. The decision of how much space to reserve depends primarily on the amount of parameter passing that the function intends to do, but it is a bit more nuanced than that, and the ABI is the document which explains it in detail

In the old style of stack frames, we would PUSH parameters to the stack, and then invoke a function.

In the new style of stack frames, EBP is not used anymore, (not sure why it is preserved and copied from ESP anymore,) parameters are placed in the stack at a specific offset with respect to ESP, and then the function is invoked. This is evidenced by the fact that mov DWORD PTR [esp], 666 is used to pass the 666 argument to the call testPassingOneInt32(666);.

2
votes

For why it's doing the push DWORD PTR [ecx-4] to copy the return address, see this partial duplicate. IIRC, it's constructing a complete copy of the return-address / saved-ebp pair.


but again gcc seems to produce stack frames aligned to 64 bytes

No, it used and esp, -32. The stack frame size looks like 64 bytes, but its alignment is only 32B.

I'm not sure why it leaves so much extra space in the stack frame. It's not very interesting to guess why gcc -O0 does what it does, because it's not even trying to be optimal.

You obviously compiled without optimization, which makes the whole thing less interesting. This tells you more about gcc internals and what was convenient for gcc, not that the code it emitted was necessary or does anything useful. Also, use http://gcc.godbolt.org/ to get nice asm output without the CFI directives and other noise. (Please tidy up the asm code blocks in your question with output from that. All the noise makes them harder to read.)