Why does GCC use multiplication by a strange number in implementing integer division?

Question

I've been reading about div and mul assembly operations, and I decided to see them in action by writing a simple program in C:

File division.c

#include <stdlib.h>
#include <stdio.h>

int main()
{
    size_t i = 9;
    size_t j = i / 5;
    printf("%zu\n",j);
    return 0;
}

And then generating assembly language code with:

gcc -S division.c -O0 -masm=intel

But looking at generated division.s file, it doesn't contain any div operations! Instead, it does some kind of black magic with bit shifting and magic numbers. Here's a code snippet that computes i/5:

mov     rax, QWORD PTR [rbp-16]   ; Move i (=9) to RAX
movabs  rdx, -3689348814741910323 ; Move some magic number to RDX (?)
mul     rdx                       ; Multiply 9 by magic number
mov     rax, rdx                  ; Take only the upper 64 bits of the result
shr     rax, 2                    ; Shift these bits 2 places to the right (?)
mov     QWORD PTR [rbp-8], rax    ; Magically, RAX contains 9/5=1 now, 
                                  ; so we can assign it to j

What's going on here? Why doesn't GCC use div at all? How does it generate this magic number and why does everything work?

gcc optimizes divisions by constants, try divisions by 2,3,4,5,6,7,8 and you will most likely see very different code for each case. — Jabberwocky
Note: Magic number -3689348814741910323 converts to CCCCCCCCCCCCCCCD as a uint64_t or just about (2^64)*4/5. — chux - Reinstate Monica
@qiubit : The compiler will nor perversely generate inefficient code just because optimisation is disabled. A trivial "optimisation" that does not involve code reordering or variable elimination will be performed regardless for example. Essentially a single source statement will translate to the most efficient code for that operation in isolation. The compiler optimisation takes into account the surrounding code rather then just the single statement. — Clifford
Some compilers actually will perversely generate inefficient code because optimization is disabled. In particular, they'll do it to make debugging easy, like the ability to set breakpoints on individual lines of code. GCC is, in fact, rather unusual in that it doesn't have a true "no optimizations" mode, because many of its optimizations are constitutively turned on. This is an example of where you can see that with GCC. Clang, on the other hand, and MSVC, will emit a div instruction at -O0. (cc @ clifford) — Cody Gray♦

Sneftel Sneftel · Accepted Answer · 2016-12-16T12:09:40

Integer division is one of the slowest arithmetic operations you can perform on a modern processor, with latency up to the dozens of cycles and bad throughput. (For x86, see Agner Fog's instruction tables and microarch guide).

If you know the divisor ahead of time, you can avoid the division by replacing it with a set of other operations (multiplications, additions, and shifts) which have the equivalent effect. Even if several operations are needed, it's often still a heck of a lot faster than the integer division itself.

Implementing the C / operator this way instead of with a multi-instruction sequence involving div is just GCC's default way of doing division by constants. It doesn't require optimizing across operations and doesn't change anything even for debugging. (Using -Os for small code size does get GCC to use div, though.) Using a multiplicative inverse instead of division is like using lea instead of mul and add

As a result, you only tend to see div or idiv in the output if the divisor isn't known at compile-time.

For information on how the compiler generates these sequences, as well as code to let you generate them for yourself (almost certainly unnecessary unless you're working with a braindead compiler), see libdivide.

Why does GCC use multiplication by a strange number in implementing integer division?

File division.c

5 Answers