1
votes

I am calling a NASM 64-bit DLL from ctypes. The dll takes five input parameters. In the Windows calling convention, the first four are passed in rcx, rdx, r8 and r9, and the fifth is passed on the stack.

The Overview of x64 Calling Conventions doc (https://docs.microsoft.com/en-us/cpp/build/overview-of-x64-calling-conventions) says "Any parameters beyond the first four must be stored on the stack, above the shadow store for the first four, prior to the call." So therefore the value can't be accessed with a pop, and I think it should be accessed with RSP. If the fifth (and later) parameters are above the shadow store, then I guessed it would be RSP minus 40 (mov rax,[rsp-40]), but it's not.

I tried "walking the stack" which means I tried at rsp-0, rsp-8, rsp-16, etc, all the way to rsp-56, but it did not return the value that I had passed as the fifth parameter (a single 64-bit double).

According to https://docs.microsoft.com/en-us/cpp/build/stack-allocation, the stack layout on entry is return address, rcx, rdx, r8, r9, and the stack parameter area, so I would expect to find my value at rsp-48, but it's not there, nor is it at rsp-56.

So my question is: how do I access a parameter passed on the stack on entry to the dll in the Windows calling convention?

EDIT: Here is the relevant ctypes code:

hDLL = ctypes.WinDLL("C:/Test_Projects/MultiCompare/py_descent.dll")
    CallName = hDLL.Main_Entry_fn
    CallName.argtypes = [ctypes.POINTER(ctypes.c_double),ctypes.POINTER        (ctypes.c_double),ctypes.c_double,ctypes.POINTER(ctypes.c_double),ctypes.c_double]
    CallName.restype = ctypes.c_double

ret_ptr = CallName(CA_x,CA_d,CA_mu,length_array_out,CA_N_epochs)

Data types:
CA_x:  pointer to double(float) array
CA_d:  pointer to double(float) array
CA_mu:  double
length_array_out:  pointer to double(float) array
CA_N_epochs:  double

Here is the dll entry point where the vars are retrieved. I always push rdi and rbp on entry, so I take parameters passed on the stack first before I do that to prevent stack misalignment:

Main_Entry_fn:
; First the stack parameters
movsd xmm0,[rsp+40]
movsd [N_epochs],xmm0
; End stack section
push rdi
push rbp
mov [x_ptr],rcx
mov [d_ptr],rdx
movsd [mu],xmm2
mov [data_master_ptr],r9
; Now assign lengths
; (this part intentionally omitted for brevity)
call py_descent_fn
exit_label_for_Main_Entry_fn:
pop rbp
pop rdi
ret
1
Things already on the stack are a positive displacement from RSP. How about RSP+40. RSP+0 is the return address, RSP+8 is start of 32-byte shadow space, RSP+40 should be the parameter on the stackMichael Petch
mov rax,[rsp+48] doesn't return the value I passed in.RTC222
Unfortunately, that's not it eitherRTC222
@MichaelPetch According to Parameter Passing, the first float is not passed in XMM0. Quote: "Floating-point and double-precision arguments are passed in XMM0 - XMM3 (up to 4) with the integer slot (RCX, RDX, R8, and R9) that would normally be used for that cardinal slot being ignored (see example) and vice versa." So func(int,int,double,int) would use RCX, RDX, XMM2, R9. A 5th parameter would always be on the stack, e.g. [RSP+40].Mark Tolonen
@MarkTolonen :Ah yep, I have been in the Linux world for so long I forgot that the vector registers are positional in the Windows 64-bit convention.Michael Petch

1 Answers

2
votes

The links provided were relatively clear, but if things are ambiguous I resort to compiling a C example and looking at the assembly. The result is at the end of this post. The assignments were:

[RSP]           Return address
[RSP+8]  ECX    int a     (XMM0 unused)
[RSP+16] EDX    int b     (XMM1 unused)
[RSP+24] XMM2   double c  (R8 unused)
[RSP+32] R9     int d     (XMM3 unused)
[RSP+40]        double e    
[RSP+48]        int f

The first four parameters are in registers. The first parameter is R8/XMM0 depending on the type. Second is R9/XMM1, etc. The fifth and later parameters ([RSP+40] and [RSP+48] in this case) are always on the stack. The four quadwords at [RSP+8] through [RSP+32] are the shadow space for the registers. I compiled with no optimization below so the function immediately spilled the registers to the shadow space.

Hope this clears it up.

C example

int func(int a, int b, double c, int d, double e, int f)
{
    return (int)(a+b+c+d+e+f);
}

int main()
{
    func(1,2,1.1,4,5.5,6);
    return 0;
}

Assembly generated:

; Listing generated by Microsoft (R) Optimizing Compiler Version 19.00.24215.1 

include listing.inc

INCLUDELIB LIBCMT
INCLUDELIB OLDNAMES

PUBLIC  func
PUBLIC  main
PUBLIC  __real@3ff199999999999a
PUBLIC  __real@4016000000000000
EXTRN   _fltused:DWORD
pdata   SEGMENT
$pdata$main DD  imagerel $LN3
        DD      imagerel $LN3+62
        DD      imagerel $unwind$main
pdata   ENDS
;       COMDAT __real@4016000000000000
CONST   SEGMENT
__real@4016000000000000 DQ 04016000000000000r   ; 5.5
CONST   ENDS
;       COMDAT __real@3ff199999999999a
CONST   SEGMENT
__real@3ff199999999999a DQ 03ff199999999999ar   ; 1.1
CONST   ENDS
xdata   SEGMENT
$unwind$main DD 010401H
        DD      06204H
xdata   ENDS
; Function compile flags: /Odtp
; File c:\users\metolone\x.c
_TEXT   SEGMENT
main    PROC

; 7    : {

$LN3:
  00000 48 83 ec 38      sub     rsp, 56                        ; 00000038H

; 8    :     func(1,2,1.1,4,5.5,6);

  00004 c7 44 24 28 06
        00 00 00         mov     DWORD PTR [rsp+40], 6
  0000c f2 0f 10 05 00
        00 00 00         movsd   xmm0, QWORD PTR __real@4016000000000000
  00014 f2 0f 11 44 24
        20               movsd   QWORD PTR [rsp+32], xmm0
  0001a 41 b9 04 00 00
        00               mov     r9d, 4
  00020 f2 0f 10 15 00
        00 00 00         movsd   xmm2, QWORD PTR __real@3ff199999999999a
  00028 ba 02 00 00 00   mov     edx, 2
  0002d b9 01 00 00 00   mov     ecx, 1
  00032 e8 00 00 00 00   call    func

; 9    :     return 0;

  00037 33 c0            xor     eax, eax

; 10   : }

  00039 48 83 c4 38      add     rsp, 56                        ; 00000038H
  0003d c3               ret     0
main    ENDP
_TEXT   ENDS
; Function compile flags: /Odtp
; File c:\users\metolone\x.c
_TEXT   SEGMENT
a$ = 8
b$ = 16
c$ = 24
d$ = 32
e$ = 40
f$ = 48
func    PROC

; 2    : {

  00000 44 89 4c 24 20   mov     DWORD PTR [rsp+32], r9d
  00005 f2 0f 11 54 24
        18               movsd   QWORD PTR [rsp+24], xmm2
  0000b 89 54 24 10      mov     DWORD PTR [rsp+16], edx
  0000f 89 4c 24 08      mov     DWORD PTR [rsp+8], ecx

; 3    :     return (int)(a+b+c+d+e+f);

  00013 8b 44 24 10      mov     eax, DWORD PTR b$[rsp]
  00017 8b 4c 24 08      mov     ecx, DWORD PTR a$[rsp]
  0001b 03 c8            add     ecx, eax
  0001d 8b c1            mov     eax, ecx
  0001f f2 0f 2a c0      cvtsi2sd xmm0, eax
  00023 f2 0f 58 44 24
        18               addsd   xmm0, QWORD PTR c$[rsp]
  00029 f2 0f 2a 4c 24
        20               cvtsi2sd xmm1, DWORD PTR d$[rsp]
  0002f f2 0f 58 c1      addsd   xmm0, xmm1
  00033 f2 0f 58 44 24
        28               addsd   xmm0, QWORD PTR e$[rsp]
  00039 f2 0f 2a 4c 24
        30               cvtsi2sd xmm1, DWORD PTR f$[rsp]
  0003f f2 0f 58 c1      addsd   xmm0, xmm1
  00043 f2 0f 2c c0      cvttsd2si eax, xmm0

; 4    : }

  00047 c3               ret     0
func    ENDP
_TEXT   ENDS
END