I'm making a floating point calculator from the ground up basically, and I'm having an issue with the part where you align the exponents of two numbers in the case that they are not equal.
For instance: 75.2 + 12.25 = 84.75
But my program is instead returning 106.5
Here is the code for the function that aligns the exponents:
void align(MyStruct* a, MyStruct* b)
{
if (a->exponent > b->exponent)
{
b->exponent = a->exponent; // Sets the exponent of b = to a
b->fraction >>= a->exponent - b->exponent // Shifts the mantissa (fraction) bits of b to the right
}
return;
}
I don't know what I'm doing wrong here. The binary representation for the example equation above is as shown:
0|10000101|00100010000000000000000 A
0|10000010|10001000000000000000000 B +
When I do b->exponent = a->exponent;
, I'm expecting it to make b
0|10000101|10001000000000000000000
, which goes smoothly. Then I expect the mantissa portion of b to be shifted right as many times is necessary to make up for the added bits that go past the 23 bit limit (in this case, it's 3) This also happens without issue, leaving b to become
0|10000101|00010001000000000000000
As far as this, I would expect to get the correct results. However it does not produce the correct number. Looking into it further with other floating point calculators online, it appears that the result of a + b is represented as 0|10000101|01010011000000000000000
in binary.
However, when adding my two modified mantissas together, that is not the result I get. What am I doing wrong here? The only thing I suspect is that the hidden bit (the 1) is not being shifted during the process. Is this the case?
I should mention that my structs are composed of three integer variables, each of which represent the individual parts of the IEEE-754 floating point formation (sign, exponent, fraction/mantissa). So the mantissa for A for example would be 00000000000100010000000000000000
(32 bits instead of 23, but when they're all parsed it becomes the full representation of the float). Also, I am pretty positive that my other functions are working as intended, and that the align is the issue here.
Any advice?
1.1
right by one bit, it should become0.11
but you are turning it into1.01
– Igor Tandetnikint fraction
for example, will wouldn't be000000001|10000000000000000000000
. But rather000000000|10000000000000000000000
, and I would just simulate the bit past the 23rd place? – EthanR