Why does the VC++ compiler MOV+PUSH args instead of just PUSH them? x86

Question

In this disassembly from VC++ a function call is being made. The compiler MOVs the local pointers to a register before pushing them:

    memcpy( nodeNewLocation, pNode, sizeCurrentNode );
0041A5DA 8B 45 F8             mov         eax,dword ptr [ebp-8]  
0041A5DD 50                   push        eax  
0041A5DE 8B 4D 0C             mov         ecx,dword ptr [ebp+0Ch]  
0041A5E1 51                   push        ecx  
0041A5E2 8B 55 D4             mov         edx,dword ptr [ebp-2Ch]  
0041A5E5 52                   push        edx  
0041A5E6 E8 67 92 FF FF       call        00413852  
0041A5EB 83 C4 0C             add         esp,0Ch

Why not just push them directly? ie

push  dword ptr [ebp-8]

Also, if you are going to do a separate push, why not do it manually. In other words, instead of doing "push eax" above, do

mov [esp], eax

Etc. the advantage of this is that after doing the 3 movs you can do a single subtract to set the new stack pointer, instead of implicitly subtracting three times with the pushes.

UPDATE---Release version

This is the same code compiled for release:

; 741  :    memcpy( nodeNewLocation, pNode, sizeCurrentNode );

  00087 8b 45 f8     mov     eax, DWORD PTR _sizeCurrentNode$[ebp]
  0008a 8b 7b 04     mov     edi, DWORD PTR [ebx+4]
  0008d 50       push    eax
  0008e 56       push    esi
  0008f 57       push    edi
  00090 e8 00 00 00 00   call    _memcpy
  00095 83 c4 0c     add     esp, 12            ; 0000000cH

Definitely more efficient than the debug version, but it is still doing a MOV/PUSH combo.

Is that actually compiled in release mode? It looks vaguely debuggish — harold
It is compiled for debug. Why would that make a difference in this case? — Tyler Durden
Because the compiler is not going to care about such things in debug mode. — harold
In your final example, is it safe to leave the stack temporarily unbalanced by deferring the sub? I know that would be bad news in real mode (interrupt "borrows" part of your stack at an inopportune time), but I am not certain in protected mode. — Brian Knoblauch
By decoupling the instructions, you reduce the number of register stalls. — Raymond Chen

Hans Passant Hans Passant · Accepted Answer · 2012-10-29T16:44:00

This is an optimization. It is explicitly mentioned in the Intel processor manuals, volume 4, section 12.3.3.6:

In Intel Atom microarchitecture, using PUSH/POP instructions to manage stack space and address adjustment between function calls/returns will be more optimal than using ENTER/LEAVE alternatives. This is because PUSH/POP will not need MSROM flows and stack pointer address update is done at AGU. When a callee function need to return to the caller, the callee could issue POP instruction to restore data and restore the stack pointer from the EBP.

Assembly/Compiler Coding Rule 19. (MH impact, M generality) For Intel Atom processors, favor register form of PUSH/POP and avoid using LEAVE; Use LEA to adjust ESP instead of ADD/SUB.

The rest of the manual isn't that clear about the reason, but it does mention a possible 3 cycle AGU stall on implicit ESP adjustments.

Why does the VC++ compiler MOV+PUSH args instead of just PUSH them? x86

3 Answers