
I compiled the following C code:

typedef struct {
    long x, y, z;
} Foo;

long Bar(Foo *f, long i)
    return f[i].x + f[i].y + f[i].z;

with the command gcc -S -O3 test.c. Here is the Bar function in the output:

    .section    __TEXT,__text,regular,pure_instructions
    .globl  _Bar
    .align  4, 0x90
    pushq   %rbp
    movq    %rsp, %rbp
    leaq    (%rsi,%rsi,2), %rcx
    movq    8(%rdi,%rcx,8), %rax
    addq    (%rdi,%rcx,8), %rax
    addq    16(%rdi,%rcx,8), %rax
    popq    %rbp

I have a few questions about this assembly code:

  1. What is the purpose of "pushq %rbp", "movq %rsp, %rbp", and "popq %rbp", if neither rbp nor rsp is used in the body of the function?
  2. Why do rsi and rdi automatically contain the arguments to the C function (i and f, respectively) without reading them from the stack?
  3. I tried increasing the size of Foo to 88 bytes (11 longs) and the leaq instruction became an imulq. Would it make sense to design my structs to have "rounder" sizes to avoid the multiply instructions (in order to optimize array access)? The leaq instruction was replaced with:

    imulq   $88, %rsi, %rcx

3 Answers

  1. The function is simply building its own stack frame with these instructions. There's nothing really unusual about them. You should note, though, that due to this function's small size, it will probably be inlined when used in the code. The compiler is always required to produce a "normal" version of the function, though. Also, what @ouah said in his answer.

  2. This is because that's how the AMD64 ABI specifies the arguments should be passed to functions.

    If the class is INTEGER, the next available register of the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used.

    Page 20, AMD64 ABI Draft 0.99.5 – September 3, 2010

  3. This is not directly related to the structure size, rather - the absolute address that the function has to access. If the size of the structure is 24 bytes, f is the address of the array containing the structures, and i is the index at which the array has to be accessed, then the byte offset to each structure is i*24. Multiplying by 24 in this case is achieved by a combination of lea and SIB addressing. The first lea instruction simply calculates i*3, then every subsequent instruction uses that i*3 and multiplies it further by 8, therefore accessing the array at the needed absolute byte offset, and then using immediate displacements to access the individual structure members ((%rdi,%rcx,8). 8(%rdi,%rcx,8), and 16(%rdi,%rcx,8)). If you make the size of the structure 88 bytes, there is simply no way of doing such a thing swiftly with a combination of lea and any kind of addressing. The compiler simply assumes that a simple imull will be more efficient in calculating i*88 than a series of shifts, adds, leas or anything else.

  1. What is the purpose of pushq %rbp, movq %rsp, %rbp, and popq %rbp, if neither rbp nor rsp is used in the body of the function?

To keep track of the frames when you use a debugger. Add -fomit-frame-pointer to optimize (note that it should be enabled at -O3 but in a lot of gcc versions I used it is not).

3. I tried increasing the size of Foo to 88 bytes (11 longs) and the leaq instruction became an imulq. Would it make sense to design my structs to have "rounder" sizes to avoid the multiply instructions (in order to optimize array access)?

The leaq call is (essentially and in this cae) calculating k*a+b where "k" is 1, 2, 4, or 8 and "a" and "b" are registers. If "a" and "b" are the same, it can be used for structures of 1, 2, 3, 4, 5, 8, and 9 longs.

Larger structures like 16 longs may be optimizable by calculating the offset with for "k" and doubling, but I do not know if that is what the compiler will actually do; you would have to test.