fixed point multiplication without 64 bit temporary

Question

Hi I'm implementing some fixed point math stuff for embedded systems and I'm trying to do the multiplication of two 16.16 fixed point numbers without creating a 64bit temporary. So far here is the code I came up with that generates the least instructions.

int multiply(int x, int y){
    int result;
    long long temp = x;
    temp *= y;
    temp >>= 16;
    result = temp;
    return result;
}

the problem with this code is that it uses a temporary 64 bit integer which seem to generate bad assembly code. I'm trying to make a system that uses two 32 bit integers instead of a 64 bit one. Anyone know how to do this?

Did you see a 64-bit temporary in the disassembly dump of the output, or just in your head? Just because the source code has a variable of type long long does not mean there's actually any "64-bit temporary". On any decent 32-bit arch, a 32x32 multiply automatically generates a 64-bit result, usually separated into two 32-bit registers for your (or the compiler's) convenience. Manipulating these is much more efficient than breaking it down into 4 16x16 multiplies to avoid the "64-bit temporary". — R.. GitHub STOP HELPING ICE
@R.. yeah I looked into the assembler dump. it uses two halves in the EAX and EDX register but it's implementation is not optimal — PgrAm
It's surely a lot less suboptimal than performing four MULs... — R.. GitHub STOP HELPING ICE
@R.. the idea is that now that I know how to do it in C I can implement it in hand optimized assembly — PgrAm
most modern x86 compilers will automatically use the upper 32 bits of the result if you specify the optimization level high enough — phuclv

Doug Currie Doug Currie · Accepted Answer · 2013-02-27T22:35:29

Think of your numbers as each composed of two large "digits."

  A.B
x C.D

The "base" of the digits is the 2^bit_width, i.e., 2^16, or 65536.

So, the product is

D*B       + D*A*65536 + C*B*65536 + C*A*65536*65536

However, to get the product shifted right by 16, you need to divide all these terms by 65536, so

D*B/65536 + D*A       + C*B        + C*A*65536

In C:

uint16_t a = x >> 16;
uint16_t b = x & 0xffff;
uint16_t c = y >> 16;
uint16_t d = y & 0xffff;

return ((d * b) >> 16) + (d * a) + (c * b) + ((c * a) << 16);

The signed version is a bit more complicated; it is often easiest to perform the arithmetic on the absolute values of x and y and then fix the sign (unless you overflow, which you can check for rather tediously).

fixed point multiplication without 64 bit temporary

1 Answers