5
votes

Taking an empty program

//demo.c

int main(void)
{

}

Compiling the program at default optimization.

gcc -S  demo.c -o dasm.asm 

I get the assembly output as

//Removed labels and directive which are not relevant

main:

pushl   %ebp                  // prologue of main
movl    %esp, %ebp            // prologue of main
popl    %ebp                  // epilogue of main
ret

Now Compiling the program at -O2 optimization.

gcc -O2 -S  demo.c -o dasm.asm 

I get the optimized assembly

main:

rep
ret

In my initial search , i found that the optimization flag -fomit-frame-pointer was responsible for removing the prologue and epilogue.

I found more information about the flag , in the gcc compiler manual.But could not understand this reason below , given by the manual , for removing the prologue and epilogue.

Don't keep the frame pointer in a register for functions that don't need one.

Is there any other way , of putting the above reason ?

What is the reason for "rep" instruction , appearing at -02 optimization ?

Why does main function , not require a stack frame initialization ?

If the setting up of the frame pointer , is not done from within the main function , then who does this job ?

Is it done by the OS or is it the functionality of the hardware ?

1
rep ret is a ret with a prefix that doesn't alter the semantics, it keeps some AMD processors happy (some of them have a penalty for jumping directly to a ret). - harold

1 Answers

5
votes

Compilers are getting smart, it knew you didn't need a stack frame pointer stored in a register because whatever you put into your main() function didn't use the stack.

As for rep ret:

Here's the principle. The processor tries to fetch the next few instructions to be executed, so that it can start the process of decoding and executing them. It even does this with jump and return instructions, guessing where the program will head next.

What AMD says here is that, if a ret instruction immediately follows a conditional jump instruction, their predictor cannot figure out where the ret instruction is going. The pre-fetching has to stop until the ret actually executes, and only then will it be able to start looking ahead again.

The "rep ret" trick apparently works around the problem, and lets the predictor do its job. The "rep" has no effect on the instruction.

Source: Some forum, google a sentence to find it.

One thing to note is that just because there is no prologue it doesn't mean there is no stack, you can still push and pop with ease it's just that complex stack manipulation will be difficult.

Functions that don't have prologue/epilogue are usually dubbed naked. Hackers like to use them a lot because they don't contaminate the stack when you jmp to them, I must confess I know of no other use to them outside optimization. In Visual Studio it's done via:

__declspec(naked)