3
votes

I know what the differences between __cdecl and __stdcall are, but I'm not quite sure as to why __stdcall is ignored by the compiler in x64 builds.

The functions in the following code

int __stdcall stdcallFunc(int a, int b, int c, int d, int e, int f, int g)
{
    return a + b + c + d + e + f + g;
}

int __cdecl cdeclFunc(int a, int b, int c, int d, int e, int f, int g)
{
    return a + b + c + d + e + f + g;
}

int main()
{
    stdcallFunc(1, 2, 3, 4, 5, 6, 7);
    cdeclFunc(1, 2, 3, 4, 5, 6, 7);

    return 0;
}

have enough parameters to exceed the available CPU registers. Therefore, some arguments must be passed via the stack. I'm not fluent in assembly but I noticed some differences between x86 and x64 assembly.

x64

main    PROC
$LN3:
        sub     rsp, 72                             ; 00000048H
        mov     DWORD PTR [rsp+48], 7
        mov     DWORD PTR [rsp+40], 6
        mov     DWORD PTR [rsp+32], 5
        mov     r9d, 4
        mov     r8d, 3
        mov     edx, 2
        mov     ecx, 1
        call    ?stdcallFunc@@YAHHHHHHHH@Z          ; stdcallFunc
        mov     DWORD PTR [rsp+48], 7
        mov     DWORD PTR [rsp+40], 6
        mov     DWORD PTR [rsp+32], 5
        mov     r9d, 4
        mov     r8d, 3
        mov     edx, 2
        mov     ecx, 1
        call    ?cdeclFunc@@YAHHHHHHHH@Z                ; cdeclFunc
        xor     eax, eax
        add     rsp, 72                             ; 00000048H
        ret     0
main    ENDP

x86

_main   PROC
        push    ebp
        mov     ebp, esp
        push    7
        push    6
        push    5
        push    4
        push    3
        push    2
        push    1
        call    ?stdcallFunc@@YGHHHHHHHH@Z          ; stdcallFunc
        push    7
        push    6
        push    5
        push    4
        push    3
        push    2
        push    1
        call    ?cdeclFunc@@YAHHHHHHHH@Z                ; cdeclFunc
        add     esp, 28                             ; 0000001cH
        xor     eax, eax
        pop     ebp
        ret     0
_main   ENDP
  1. The first 4 arguments are, as expected, passed via registers in x64.
  2. The remaining arguments are put on the stack in the same order as in x86.
  3. Contrary to x86, in x64 we don't use push instructions. Instead we reserve enough stack space at the beginning of main and use mov instructions to add the arguments to the stack.
  4. In x64, no stack cleanup is happening after both calls, but at the end of main.

This brings me to my questions:

  1. Why does x64 use mov rather than push? I assume it's just more efficient and wasn't available in x86.
  2. Why is there no stack cleanup after the call instructions in x64?
  3. What's the reason that Microsoft chose to ignore __stdcall in x64 assembly? From the docs:

    On ARM and x64 processors, __stdcall is accepted and ignored by the compiler

Here is the example code and assembly.

1
A new calling convention was created for x64, and there was no good reason to intentionally create two incompatible versions, so there's only one. You can read about it here.Nate Eldredge

1 Answers

5
votes
  1. Why does x64 use mov rather than push? I assume it's just more efficient and wasn't available in x86.

That is not the reason. Both of these instructions also exist in x86 assembly language.

The reason why your compiler is not emitting a push instruction for the x64 code is probably because it must adjust the stack pointer directly anyway, in order to create 32 bytes of "shadow space" for the called function. See this link (which was provided by @NateEldredge) for further information on "shadow space".

Allocating 32 bytes of "shadow space" with push instructions would take 4 64-bit push instructions, but only one sub instruction. That is why it prefers to use the sub instruction. Since it is using the sub instruction anyway to create 32 bytes of shadow space, there is no penalty to change the operand of the sub instruction from 32 to 72, which allocates 72 bytes of memory on the stack, which is enough to also pass 3 paramters on the stack (the other 4 are passed in CPU registers).

I don't understand why it is allocating 72 bytes on the stack, though, as, according to my calculcations, it only has to be 56 bytes (32 bytes of "shadow space" and 24 bytes for the 3 parameters that are passed on the stack). Possibly, the compiler is reserving those extra 16 bytes for local variables or for exception handling, which may be optimized away when compiler optimizations are active.


  1. Why is there no stack cleanup after the call instructions in x64?

There is stack cleanup after the call instructions. This is what the line

add rsp, 72

does.

However, for some reason (probably increased performance), the x64 compiler only performs the cleanup at the end of the calling function, instead of after every function call. This means that with the x64 compiler, all function calls share the same stack space for their parameters, whereas with the x86 compiler, the stack space is allocated and cleaned up at every function call.


  1. What's the reason that Microsoft chose to ignore __stdcall in x64 assembly?

The keywords _stdcall and _cdecl specify 32-bit calling conventions. That's why they are not relevant for 64-bit programs (i.e. x64). On x64, there is only the standard calling convention and the extended __vectorcall calling convenction.