Rounding IEEE 754 Floating Point Numbers

Question

I understand that there are 4 rounding modes supported by IEEE 754 standards I just want to make sure I understand each really well. Are the following examples correct?

• Rounding towards positive infinity: Does that mean we add one to the LSB no matter what it is (0 or 1)?

Say we have 1.1010100000000...00 rounding towards positive infinity means we do +1 so we get 1.1010100000000...01 & similarly when we have 1.10101000000..01 and we need to round towards positive infinity we do +1 and we get 1.1010100000...10?

• Rounding towards negative infinity: We add 0 to LSB meaning we do nothing (meaning that it is same as truncating mode) or do we change the LSB to 0 (if 1 we change it to 0 and if 0 we keep it 0)?

• Truncating Mode: Just chop off the GRS bits

• Round towards nearest even : I learned that one from How to perform round to even with floating point numbers and its pretty clear now.

Pascal Cuoq Pascal Cuoq · Accepted Answer · 2018-04-07T20:22:08

Your question indicates that you are thinking in terms of a practical implementation, but it is easier to think of rounding strategies in simpler, more abstract terms, and then the implementation falls into place naturally.

You mostly have it right, except for a couple of details.

rounding towards +inf means that if the result is not exact, the value closest to +inf out of the two candidates should be chosen. First, in sign-magnitude, this latter possibility only mean adding one to the significand as computed if the result is positive. If the result is negative, then “chopping of the GRS bits”, the thing you suggest to do for truncating mode, is correct. (Thinking about it for two seconds should convince you that it is normal that rounding towards +inf and truncating coincide on negative results.) Second, if the result is exact (which means the GRS bits are all zero in the implementation you are thinking of), the computed bits should be returned as-is, even if the result is positive.
similarly, rounding towards -inf coincides with truncating for the positive results, and corresponds to adding one to the significand of negative results that are not exact.

Rounding IEEE 754 Floating Point Numbers

1 Answers