9
votes

I am following this tutorial about assembly.

According to the tutorial (which I also tried locally, and got similar results), the following source code:

int natural_generator()
{
        int a = 1;
        static int b = -1;
        b += 1;              /* (1, 2) */
        return a + b;
}

Compiles to these assembly instructions:

$ gdb static
(gdb) break natural_generator
(gdb) run
(gdb) disassemble
Dump of assembler code for function natural_generator:
push   %rbp
mov    %rsp,%rbp
movl   $0x1,-0x4(%rbp)
mov    0x177(%rip),%eax        # (1)
add    $0x1,%eax
mov    %eax,0x16c(%rip)        # (2)
mov    -0x4(%rbp),%eax
add    0x163(%rip),%eax        # 0x100001018 <natural_generator.b>
pop    %rbp
retq   
End of assembler dump.

(Line number comments (1), (2) and (1, 2) added by me.)

Question: why is, in the compiled code, the address of the static variable b relative to the instruction pointer (RIP), which constantly changes (see lines (1) and (2)), and thus generates more complicated assembly code, rather than being relative to the specific section of the executable, where such variables are stored?

According to the mentioned tutorial, there is such a section:

This is because the value for b is hardcoded in a different section of the sample executable, and it’s loaded into memory along with all the machine code by the operating system’s loader when the process is launched.

(Emphasis mine.)

1
This makes it position independent which is useful for shared libraries and ASLR among other things. Also note there is no addressing mode that is "relative to the specific section of the executable" and even addresses in the same section can be relative (common for control transfers).Jester
thus generates more complicated assembly code: no it doesn't. Use objdump -drwC -Mintel to get nice output. -r decodes the symbol table. objdump always does the math for you, and shows the actual target address of RIP-relative instruction as well as the offset from RIP.Peter Cordes
The size of the generated instructions matter a great deal, it all needs to come from RAM and get cached in the processor caches. Memory is a significant bottleneck on modern processors. Imagine how well your preferred scheme could work if every instruction that accesses memory needs to also have 8 bytes to encode the address. Machine code is generated by a machine, it doesn't mind doing a complicated job.Hans Passant
@PeterCordes You won't normally see a C++ compiler doing initialization of statically allocated variables at run time in cases where you wouldn't see a C compiler doing runtime initialization (ie. where the C++ initialization would be allowed in C, as C compilers don't normally support runtime initialization of statics). That's the case here as the variable b isn't initialized in the function.Ross Ridge
@RossRidge: Right, my comment turned into a mess of confusion because I didn't re-write it from scratch once I realized that wasn't a problem in this case. I was thinking at first that it looked like way too much asm for such a simple function, but of course that's just because the OP failed to enable optimization. I only noticed when I looked closer and saw no branches, then /facepalm, oh yeah that's just an int with a constant initializer.Peter Cordes

1 Answers

9
votes

There are two main reasons why RIP-relative addressing is used to access the static variable b. The first is that it makes the code position independent, meaning if it's used in a shared library or position independent executable the code can be more easily relocated. The second is that it allows the code to be loaded anywhere in the 64-bit address space without requiring huge 8 byte (64-bit) displacements to be encoded in the instruction, which aren't supported by 64-bit x86 CPUs anyways.

You mention that the compiler could instead generate code that referenced the variable relative to the beginning of the section it lives in. While its true doing this would also have the same advantages as given above, it wouldn't make the assembly any less complicated. In fact it will make it more complicated. The generated assembly code would first have to calculate the address of the section the variable lives in, since it would only know its location relative to the instruction pointer. It would then have to store it in a register, so accesses to b (and any other variables in the section) can be made relative to that address.

Since 32-bit x86 code doesn't support RIP-relative addressing, your alternate solution is fact what the compiler does when generating 32-bit position independent code. It places the variable b in the global offset table (GOT), and then accesses the variable relative to the base of the GOT. Here's the assembly generated by your code when compiled with gcc -m32 -O3 -fPIC -S test.c:

natural_generator:
        call    __x86.get_pc_thunk.cx
        addl    $_GLOBAL_OFFSET_TABLE_, %ecx
        movl    b.1392@GOTOFF(%ecx), %eax
        leal    1(%eax), %edx
        addl    $2, %eax
        movl    %edx, b.1392@GOTOFF(%ecx)
        ret

The first function call places the address of the following instruction in ECX. The next instruction calculates the address of the GOT by adding the relative offset of the GOT from the start of the instruction. The variable ECX now contains the address of the GOT and is used as a base when accessing the variable b in the rest of the code.

Compare that to 64-bit code generated by gcc -m64 -O3 -S test.c:

natural_generator:
        movl    b.1745(%rip), %eax
        leal    1(%rax), %edx
        addl    $2, %eax
        movl    %edx, b.1745(%rip)
        ret

(The code is different than the example in your question because optimization is turned on. In general its a good idea to only look at optimized output, as without optimization the compiler often generates terrible code that does a lot of useless things. Also note that the -fPIC flag doesn't need to be used, as the compiler generates 64-bit position independent code regardless.)

Notice how there's two fewer assembly instructions in the 64-bit version making it the less complicated version. You can also see that the code uses one less register (ECX). While it doesn't make much of a difference in your code, in a more complicated example that's a register that could've been used for something else. That makes the code even more complicated as the compiler needs to do more juggling of registers.