In this compiler output, why does func(int) use its first arg as a pointer, zeroing 24 bytes of pointed-to memory? The arg isn't a pointer

Question

I'm having problems understanding what this assembly code does (this is a small piece of a bigger assembly code, this is Intel syntax):

vector<int> func(int i) { ...}  // C++ source

clang output from the Godbolt compiler explorer:

func(int): # @func(int)
    push    rbp
    push    rbx
    push    rax
    mov     ebp, esi
    mov     rbx, rdi
    xorps   xmm0, xmm0
    movups  xmmword ptr [rbx], xmm0
    mov     qword ptr [rbx + 16], 0

This is compiled on Linux, following the official System V AMD64 ABI. According to this link, the rdi register is used to pass the first argument to the function. So on this line

mov rbx, rdi

We move the value of the argument (an int in this case) to rbx. Shortly after, we do:

movups xmmword ptr [rbx], xmm0

And this is what I don't understand. rbx contains the value of the argument, which is an int, and here we are copying the contents of xmm0 to the address pointed by rbx (but rbx does not contain any address, just the argument of the function!)

There is something that I'm getting wrong, but I can't figure out why.

Objects are returned through a hidden first argument. As such, your function looks more like func(vector<int>* result, int i). rdi is the pointer to where the result will be returned, i is in esi (the second argument slot). — Jester
Just a note that not all objects are returned with a hidden pointer: the ABI lets you pack most objects of 16 bytes or less into the rax and rdx registers as return values, but std::vector is generally 24 bytes so ends up using the hidden pointer. — BeeOnRope

BeeOnRope BeeOnRope · Accepted Answer · 2018-06-14T01:14:42

In the SysV 64-bit ABI used by Linux and most other 64-bit x86 operating systems outside of Windows, a struct or class return value is either returned in the rax or rdx registers, or via a hidden pointer passed as the first argument.

The decision between the two options depends mostly on the size of the returned structure: structures larger than 16 bytes generally use the hidden pointer approach, but there are other factors as well and I recommend this answer for a more comprehensive treatment.

When the hidden pointer approach is used, we need a way to pass this pointer to the function. In this case the pointer behaves as if it were the first argument (passed in rdi), which shifts the other arguments into later positions².

We can see this clearly by examining the code generated for functions returning struct objects of 1 through 5 int values (hence 4 through 20 bytes on this platform). The C++ code:

struct one {
    int x;
};

struct two {
    int x1, x2;
};

struct three {
    int x1, x2, x3;
};

struct four {
    int x1, x2, x3, x4;
};

struct five {
    int x1, x2, x3, x4, x5;
};


one makeOne() {
    return {42};
}

two makeTwo() {
    return {42, 52};
}

three makeThree() {
    return {42, 52, 62};
}

four makeFour() {
    return {42, 52, 62, 72};
}

five makeFive() {
    return {42, 52, 62, 72, 82};
}

Results in the following assembly in clang 6.0 (but other compilers behave similarly:

makeOne():                            # @makeOne()
        mov     eax, 42
        ret
makeTwo():                            # @makeTwo()
        movabs  rax, 223338299434
        ret
makeThree():                          # @makeThree()
        movabs  rax, 223338299434
        mov     edx, 62
        ret
makeFour():                           # @makeFour()
        movabs  rax, 223338299434
        movabs  rdx, 309237645374
        ret
.LCPI4_0:
        .long   42                      # 0x2a
        .long   52                      # 0x34
        .long   62                      # 0x3e
        .long   72                      # 0x48
makeFive():                           # @makeFive()
        movaps  xmm0, xmmword ptr [rip + .LCPI4_0] # xmm0 = [42,52,62,72]
        movups  xmmword ptr [rdi], xmm0
        mov     dword ptr [rdi + 16], 82
        mov     rax, rdi
        ret

The basic pattern is that up to and including 8 bytes, the struct is returned in entirely in rax (including packing multiple smaller values in the 64-bit register), and for objects up to 16 bytes both rax and rdx are used¹.

After that, the strategy changes completely, and we see that a memory write occurs to the location pointed to by rdi - this is the above-mentioned hidden pointer approach.

Finally, to wrap it all up, we note that sizeof(vector<int>) is usually 24 bytes on 64-bit platforms, and is definitely 24 bytes on the major C++ compilers on Linux - so the hidden pointer approach applies for vector.

Credit to Jester who already answered this, in a briefer form, in the comments.

¹ The weird constants like 223338299434 that are being stored into the 64-bit registers are just an optimization: the compiler is just combining both 32-bit stores into a single 64-bit constant, as in 52ul << 32 | 42ul which results in 223338299434.

² This is the same approach used to pass this for member functions: in the case that a member function also returns a value that is passed with the hidden pointer approach, the hidden pointer comes first (in rdi), then the this pointer (in rsi) and then finally the first user-provided argument (usually in rdx - but this depends on the type). Here's an example.

In this compiler output, why does func(int) use its first arg as a pointer, zeroing 24 bytes of pointed-to memory? The arg isn't a pointer

1 Answers