Fixed point multiplication in assembly (x86)

Question

I want to multiply and divide an unsigned 8.8 fixed-point number in the ax register with 1.00125 and store the result in ax as well.

I know that fixed point multiplication/division requires some extra steps but I have no idea how to implement those in assembly.

Help is greatly appreciated.

Start by writing pseudocode, or higher-level language. Figure out the sequence of mathematical operations that will be needed (multiply, divide, add, subtract, shift, mask). Once you have the algorithm clearly defined, then think about how to implement it in assembly. Look up what instruction or sequence can be used to accomplish each step of the algorithm. — Nate Eldredge
Note that x86 has widening multiply, and similarly division with a dividend twice as wide as the quotient and divisor, so if you use those instead of shift-and-add you can take advantage of that to get the high-half product easily. 0.00125 is less than 1/256, and if you tried to represent it as an 8.8 fixed-point number it would round to 0. And it's not a 2^-n binary fraction so it's not just a simple right-shift and add. — Peter Cordes
The easiest way is probably to use floating point numbers and then to just convert the result back to 8.8 fixed point. — fuz

Brendan Brendan · Accepted Answer · 2021-11-01T04:30:42

If you care about accuracy, 1.00125 can't be stored exactly in any integer format or in any floating point format because it's a recursive fraction in binary (in binary it's 1.000000000101000111101011100001010001111010111...b where that 00001010001111010111 sequence repeats forever). For this reason I'd convert it into the rational number 801/800; and then do x * 1.00125 = (x * 801) / 800 (possibly with "round to nearest" on the division).

If you don't care about accuracy, then the more bits you can use for "1.00125" the closer the result will be to the correct answer. With 8 bits ("1.7 fixed point") the closest you can get is 1.0000000b, which means you can just skip the multiplication (x * 1.00125 = x). With 16 bits ("1.15 fixed point") the closest you can get is 1.000000000101001b (or 1.001220703125 in decimal).

However, you can cheat more. Specifically, you can significantly increase accuracy with the same number of bits by doing (x * 1) + (x * 0.00125). E.g. instead of having a 16 bit constant like 1.000000000101001b (where 9 bits are zeros), you can have a 16-bit constant like 0.0000000001010001111010111b (where the 16 bits are the last 16 bits without any of the leading zeros). In this case the constant is very close (like 0.00124999880) rather than "less close" (like 1.001220703125 was).

Ironically; with only 16 bits, this "0.00125" is more accurate than a 32-bit floating point representation of 1.00125 can be.

So.. in assembly (assuming everything is unsigned) it might look like:

    ;ax = x << 8 (or x as an 8.8 fixed point number)

    mov cx,ax         ;cx = x << 8

    mov bx,41943      ;bx = 41943 = 0.00124999880 << 25
    mul bx            ;dx:ax = (x << 8) * (0.00124999880 << 25) = x * 0.00124999880 << 33
                      ;dx = x * 0.00124999880 << 17
    shr dx,9          ;dx = x * 0.00124999880 << 17 >> 9 = x * 0.00124999880 << 8, carry flag = last bit shifted out
    adc dx,0          ;Round up to nearest (add 1 if last bit shifted out was set)

    lea ax,[dx+cx]    ;ax = x << 8 + x * 0.00124999880 << 8 = x * 1.00124999880 << 8

Of course the problem here is that converting it back to "8.8 fixed point" ruins most of the accuracy anyway. To keep most of the accuracy, you could use a 32-bit result ("8.24 fixed point") instead. This might look like:

    ;ax = x << 8 (or x as an 8.8 fixed point number)

    mov cx,ax         ;cx = x << 8

    mov bx,41943      ;bx = 41943 = 0.00124999880 << 25
    mul bx            ;dx:ax = (x << 8) * (0.00124999880 << 25) = x * 0.00124999880 << 33

    add ax,1 << 8     ;To cause the following shift to round to nearest
    adc dx,0

    shrd ax,dx,9
    shr dx,9          ;dx:ax = x * 0.00124999880 << 33 >> 0 = x * 0.00124999880 << 24

                      ;cx:0 = x << 24
    add dx,cx         ;dx:ax = x << 24 + x * 0.00124999880 << 24 = x * 1.00124999880 << 24

The other problem is that there's potential overflow. E.g. if x was 0xFF.FF (or about 255.996) the result would be something like 256.32 which is too big to fit in an "8.8" or "8.24" or "8.anything" fixed point format. To avoid that problem you can just increase the number of integer bits (and reduce the accuracy by 1 bit) - e.g. make the result "9.7 fixed point", or "9.23 fixed point".

The important points here are:

a) For "fixed point" calculations, every operation (multiplication, division, addition, ...) causes the decimal point to move.

b) Because the decimal point keeps moving, it's best to adopt a standard notation for where the decimal point is at each step. My way is to include an "explicit shift" in the comments (e.g. "x << 8" rather than just "x"). This "explicit shift documented in the comments" makes it easy to determine where the decimal point moves, and if/how much you need to shift left/right to convert to a different fixed point format.

c) For good code, you need to pay attention to accuracy and overflow, and this causes the decimal point to move around even more (and makes the use of a "standard notation for where the decimal point is" more important).

Fixed point multiplication in assembly (x86)

3 Answers