1
votes

I'm attempting to develop an FPU (Compliant with ieee 754) as a graduation project and I have some troubles with the sum function. The last 2 weeks I was investigating and working some operations (many of them) on paper, the aim is to understand how this standard works. So, this is the question....I'm confused because when I perform an add of numbers with the same sign I've got correct results but I have problems when adding numbers with opposite sign. For example,the add of -1200.23 and 500.125. The ieee 754-2008 single precision representation of the numbers is (sign bit---exponent----significand):

-1200.23 = 1 --- 10001001 --- 1.001 0110 0000 0111 0101 1100 
 500.125 = 0 --- 10000111 --- 1.111 1010 0001 0000 0000 0000

The exponents are 137 for -1200.23 and 135 for 500.125. The exponents are not equal so we need to normalize the significand of 500.125, to do this I shift right two times the significand (137-135 = 2). The new mantissa of 500.125 is:

 0.011 1110 1000 0100 0000 0000 

***Before continue I want to say that I've seen a similar question (How to subtract IEEE 754 numbers? it doesn't answer my question*****

So, the next step is to add the significands or subtract the significands? I've trying in both ways but I've still got incorrect results... Thanks.....

1
You don't mention anything about the 2s complement. If negative, 2s complement that number and perform an add.Stefan Hanke
Are you sure you want to do this? Developing a whole IEEE compliant FPU is not for the faint of heart, and probably not a project for someone who has to ask how to perform a simple task like adding two differently signed numbers.Rudy Velthuis

1 Answers

0
votes

These numbers are sign-magnitude (as soon as the exponents are aligned) and not two's complement.

if they have a different sign, to add them as a whole, you have to subtract the smaller from the larger shifted significand and adjust the sign accordingly, depending on which one had the larger magnitude. Then you normalize the result.

If you subtract the larger from the smaller, you get a wrong, two's complement result, which you would have to negate and, if necessary, shift, to make it positive again, which would simply be too much unnecessary work.

If you add the two shifted significands, you certainly don't get what you need, just like -4.1 + 3.2 neither results in 7.3, nor in -7.3. It becomes -(4.1 - 3.2), or -0.9.

In other words, the magnitude is determined by subtracting the smaller from the larger (3.2 from 4.1) and the sign by the sign of the larger (in this case, -).

Also don't forget to handle (add before processing and remove afterward) the hidden bit in some of the formats. And take care to handle NaN and denormals properly too.

And do not shift right to align. That way you lose bits. Shift left to align the significands and normalize later. The intermediate result may be larger, but this is corrected by the normalization and proper rounding.