To provide perhaps a clearer example, on x86_64, compiled with the -O
flag, the function
pub fn leet(a : i128) -> i128 {
a + 1337
}
compiles to
example::leet:
mov rdx, rsi
mov rax, rdi
add rax, 1337
adc rdx, 0
ret
(My original post had u128
rather than the i128
you asked about. The function compiles the same code either way, a good demonstration that signed and unsigned addition are the same on a modern CPU.)
The other listing produced unoptimized code. It’s safe to step through in a debugger, because it makes sure you can put a breakpoint anywhere and inspect the state of any variable at any line of the program. It’s slower and harder to read. The optimized version is much closer to the code that will actually run in production.
The parameter a
of this function is passed in a pair of 64-bit registers, rsi:rdi. The result is returned in another pair of registers, rdx:rax. The first two lines of code initialize the sum to a
.
The third line adds 1337 to the low word of the input. If this overflows, it carries the 1 in the CPU’s carry flag. The fourth line adds zero to the high word of the input—plus the 1 if it got carried.
You can think of this as simple addition of a one-digit number to a two-digit number
a b
+ 0 7
______
but in base 18,446,744,073,709,551,616. You’re still adding the lowest “digit” first, possibly carrying a 1 to the next column, then adding the next digit plus the carry. Subtraction is very similar.
Multiplication must use the identity (2⁶⁴a + b)(2⁶⁴c + d) = 2¹²⁸ac + 2⁶⁴(ad+bc) + bd, where each of these multiplications returns the upper half of the product in one register and the lower half of the product in another. Some of those terms will be dropped, because bits above the 128th don’t fit into a u128
and are discarded. Even so, this takes a number of machine instructions. Division also takes several steps. For a signed value, multiplication and division would additionally need to convert the signs of the operands and the result. Those operations are not very efficient at all.
On other architectures, it gets easier or harder. RISC-V defines a 128-bit instruction-set extension, although to my knowledge no one has implemented it in silicon. Without this extension, the RISC-V architecture manual recommends a conditional branch: addi t0, t1, +imm; blt t0, t1, overflow
SPARC has control codes like the control flags of x86, but you have to use a special instruction, add,cc
, to set them. MIPS, on the other hand, requires you to check whether the sum of two unsigned integers is strictly less than one of the operands. If so, the addition overflowed. At least you’re able to set another register to the value of the carry bit without a conditional branch.