GCC's assembly output of an empty program on x86, win32

Question

I write empty programs to annoy the hell out of stackoverflow coders, NOT. I am just exploring the gnu toolchain.

Now the following might be too deep for me, but to continuie the empty program saga I have started to examine the output of the C compiler, the stuff GNU as consumes.

gcc version 4.4.0 (TDM-1 mingw32)

test.c:

int main()
{
    return 0;
}

gcc -S test.c

    .file   "test.c"
    .def    ___main;    .scl    2;  .type   32; .endef
    .text
.globl _main
    .def    _main;  .scl    2;  .type   32; .endef
_main:
    pushl   %ebp
    movl    %esp, %ebp
    andl    $-16, %esp
    call    ___main
    movl    $0, %eax
    leave
    ret

Can you explain what happens here? Here is my effort to understand it. I have used the as manual and my minimal x86 ASM knowledge:

.file "test.c" is the directive for the logical filename.
.def: according to the docs "Begin defining debugging information for a symbol name". What is a symbol (a function name/variable?) and what kind of debugging information?
.scl: docs say "Storage class may flag whether a symbol is static or external". Is this the same static and external I know from C? And what is that '2'?
.type: stores the parameter "as the type attribute of a symbol table entry", I have no clue.
.endef: no problem.
.text: Now this is problematic, it seems to be something called section and I have read that its the place for code, but the docs didn't tell me too much.
.globl "makes the symbol visible to ld.", the manual is quite clear on this.
_main: This might be the starting address (?) for my main function
pushl_: A long (32bit) push, which places EBP on the stack
movl: 32-bit move. Pseudo-C: EBP = ESP;
andl: Logical AND. Pseudo-C: ESP = -16 & ESP, I don't really see whats the point of this.
call: Pushes the IP to the stack (so the called procedure can find its way back) and continues where __main is. (what is __main?)
movl: this zero must be the constant I return at the end of my code. The MOV places this zero into EAX.
leave: restores stack after an ENTER instruction (?). Why?
ret: goes back to the instruction address that is saved on the stack

Thank you for your help!

I found the COFF specification. This should give some references to what "32" in ".type" means etc: microsoft.com/whdc/system/platform/firmware/PECOFFdwn.mspx — Johannes Schaub - litb

nos nos · Accepted Answer · 2009-08-22T23:34:12

.file "test.c"

Commands starting with . are directives to the assembler. This just says this is "file.c", that information can be exported to the debugging information of the exe.

.def ___main; .scl 2; .type 32; .endef

.def directives defines a debugging symbol. scl 2 means storage class 2(external storage class) .type 32 says this sumbol is a function. These numbers will be defined by the pe-coff exe-format

___main is a function called that takes care of bootstrapping that gcc needs(it'll do things like run c++ static initializers and other housekeeping needed).

.text

Begins a text section - code lives here.

.globl _main

defines the _main symbol as global, which will make it visible to the linker and to other modules that's linked in.

.def        _main;  .scl    2;      .type   32;     .endef

Same thing as _main , creates debugging symbols stating that _main is a function. This can be used by debuggers.

_main:

Starts a new label(It'll end up an address). the .globl directive above makes this address visible to other entities.

pushl       %ebp

Saves the old frame pointer(ebp register) on the stack (so it can be put back in place when this function ends)

movl        %esp, %ebp

Moves the stack pointer to the ebp register. ebp is often called the frame pointer, it points at the top of the stack values within the current "frame"(function usually), (referring to variables on the stack via ebp can help debuggers)

andl $-16, %esp

Ands the stack with fffffff0 which effectivly aligns it on a 16 byte boundary. Access to aligned values on the stack are much faster than if they were unaligned. All these preceding instructions are pretty much a standard function prologue.

call        ___main

Calls the ___main function which will do initializing stuff that gcc needs. Call will push the current instruction pointer on the stack and jump to the address of ___main

movl        $0, %eax

move 0 to the eax register,(the 0 in return 0;) the eax register is used to hold function return values for the stdcall calling convention.

leave

The leave instruction is pretty much shorthand for

movl     ebp,esp
popl     ebp

i.e. it "undos" the stuff done at the start of the function - restoring the frame pointer and stack to its former state.

ret

Returns to whoever called this function. It'll pop the instruction pointer from the stack (which a corresponding call instruction will have placed there) and jump there.

GCC's assembly output of an empty program on x86, win32

5 Answers