2
votes

I'm designing a single precision floating point multiplier on hardware and I use python to verify my design. This is the single precision floating point format.

s | exponent | mantissa

The multiplied result is

(S1^S2)| E1 + E2 - 127 | Mantissa1 * Mantissa2

I'm having a problem at rounding the number after performing the multiplication of two mantissas. You know, mantissa is 24 bits (23 and 1 hidden bit) and multiply two 24 bits will give an 48 bits. The mantissa field can only contain 23 bits so I tried truncating it, like this: example of mantissa truncating. But the result does not seem right when compare to python float32 multiplication.
So I'm here to ask how python do to the mantissa multiplication thing. Thank you.

3
Base Python does not have a float32 type. Do you mean NumPy’s np.float32? - Eric Postpischil
@EricPostpischil Sorry, I meant float - Bruce Huynh

3 Answers

2
votes

Python (and many other languages) use the IEEE754 default rounding mode, known as "Round to Nearest Ties to Even." You can read more about it here: https://en.wikipedia.org/wiki/IEEE_754#Rounding_rules

1
votes

In this answer, I will explain how to round using the IEEE-754 round-to-nearest-ties-to-even rule for basic 32-bit binary floating-point. This is likely what you want. Although many Python implementations use IEEE-754, this is not required by the Python documentation, and, even in implementations that do use IEEE-754, there may be deviations from the IEEE-754 standard, so you should not rely on Python as an authoritative reference.

Given a 48-bit significand1, let’s number the bits from 0 for the most significant bit to 47 for the least significant bit. (This is reversed from the usual direction but is convenient for this discussion.)

If the product is within the normal range of the floating-point format, then the significand would be rounded:

  • Consider bit 24 and the trailing bits after that. If bit 24 is 0, round down (just use bits 0 to 23). If bit 24 is 1 and any trailing bit is 1, round up (take bits 0 to 23 and add one to them). If bit 24 is 1 and all trailing bits are 0, round down if bit 23 is even (zero), and round up if bit 23 is odd (one).
  • When rounding up, this can overflow bits 0 to 23 (if they were 1111…111, the addition carries out of bit 0). In this case, shift them right one bit and add one to the exponent.

As stated, the above is for normal cases. To handle all cases:

  • Test for zero: If the significand is zero, return zero (with the calculated sign).
  • Normalize the significand: If bit 0 of the significand field is 0, shift the significand left one bit and subtract one from the exponent. Repeat this step until bit 0 of the significand is not 0.
  • Test for overflow: If the true exponent exceeds 127 (biased exponent exceeds 254), produce infinity (with the calculated sign).
  • Limit the exponent: If the true exponent (which I will call e) is less than −126 (biased exponent is less than 1), shift the significand right by −126−e bits, inserting 0s on the left. Set the exponent to −126. Round the significand as above, with the bits numbered from 0 to 47+(−126−e).
  • Test for overflow: If rounding up increased the exponent to 127, produce infinity (with the calculated sign).
  • Produce normal result: If bit 0 of the significand is 1, use bits 1 to 23 for the significand field and e+127 for the exponent field.
  • Produce subnormal result: Otherwise, bit 0 of the significand is 0. In this case, still use bits 1 to 23 for the significand field, but use zero for the exponent field. This exponent field indicates a subnormal number.

Note that the “limit the exponent” step may seem to undo the “normalize the significand” step, making the former unnecessary, but this combination handles all cases of normal and subnormal operands producing normal and subnormal results.

Footnote

1 “Significand” is the term used in the IEEE-754 standard for the fraction portion of a floating-point number. “Mantissa” is an old term for the fraction portion of a logarithm. (Significands are linear. Mantissas are logarithmic.)

0
votes

I've done this exact thing in the past as an exercise for a job application:

Python script to generate random floats and create a test vector of the IEEE754 values and the product:

import bitstring, random 

spread = 10000000
n = 100000

def ieee754(flt):
    b = bitstring.BitArray(float=flt, length=32)
    return b

if __name__ == "__main__":
    with open("TestVector", "w") as f:
        for i in range(n):
            a = ieee754(random.uniform(-spread, spread))
            b = ieee754(random.uniform(-spread, spread))

            # calculate desired product based on 32-bit ieee 754 
            # representation of the floating point of each operand
            ab = ieee754(a.float * b.float)

            f.write(a.bin + "_" + b.bin + "_" + ab.bin + "\n")

Verilog code to do 32-bit floating point multiplication with nearest-even rounding:

/*   Input and outputs are in single-precision IEEE 754 FP format:
     Sign     -   1 bit, position 32
     Exponent -  8 bits, position 31-24
     Mantissa - 23 bits, position 22-1

Not implemented: special cases like inf, NaN
*/

// indices of components of IEEE 754 FP
`define SIGN    31
`define EXP     30:23
`define M       22:0

`define P       24    // number of bits for mantissa (including 
`define G       23    // guard bit index
`define R       22    // round bit index
`define S       21:0  // sticky bits range

`define BIAS    127


module fp_mul(input  wire clk,
              input  wire[31:0] a,
              input  wire[31:0] b,
              output wire[31:0] y);

    reg [`P-2:0] m;
    reg [7:0] e;
    reg s;

    reg [`P*2-1:0] product;
    reg G;
    reg R;
    reg S;
    reg normalized;

    reg state;
    reg next_state = 0;

    parameter STEP_1 = 1'b0, STEP_2 = 1'b1;


    always @(posedge clk) begin
        state <= next_state;
    end


    always @(state) begin

        case(state)
            STEP_1: begin
                // mantissa is product of a and b's mantissas, 
                // with a 1 added as the MSB to each
                product = {1'b1, a[`M]} * {1'b1, b[`M]};

                // get sticky bits by ORing together all bits right of R
                S = |product[`S]; 

                // if the MSB of the resulting product is 0
                // normalize by shifting right    
                normalized = product[47];
                if(!normalized) product = product << 1; 

                next_state = STEP_2;            
                end 

            STEP_2: begin
                // if either mantissa is 0, result is 0 
                if(!a[`M] | !b[`M]) begin 
                    s = 0; e = 0; m = 0;
                end else begin
                    // sign is xor of signs
                    s = a[`SIGN] ^ b[`SIGN];

                    // mantissa is upper 22-bits of product w/ nearest-even rounding
                    m = product[46:24] + (product[`G] & (product[`R] | S));

                    // exponent is sum of a and b's exponents, minus the bias 
                    // if the mantissa was shifted, increment the exponent to balance it
                    e = a[`EXP] + b[`EXP] - `BIAS + normalized;
                end 

                next_state = STEP_1;
                end
        endcase
    end

    // output is concatenation of sign, exponent, and mantissa  
    assign y = {s, e, m};
endmodule 

Hope that helps!