I'm trying to understand IEEE 754 floating point addition at a binary level. I have followed some example algorithms that I have found online, and a good number of test cases match against a proven software implementation. My algorithm is only dealing with positive numbers at the moment. However, I am not getting a match with this test case:
00001000111100110110010010011100 (1.46487e-33)
00000000000011000111111010000100 (1.14741e-39)
I split it up into sign bit, exponent, mantissa. I add back in the implicit 1 to the mantissa
0 00010001 1.11100110110010010011100
0 00000000 1.00011000111111010000100
I subtract the larger exponent from the smaller in order to determine the realignment-shift amount:
00010001 (17)
-00000000 (0)
=============
17
I tack on a Guard bit, Round Bit, and Sticky Bit to the mantissas:
1.11100110110010010011100 000
1.00011000111111010000100 000
I shift the lesser value's mantissa to the right 17 times, with the LSb "sticking" once it receives a 1:
0.00000000000000001000110 001
I add the greater mantissa to the shifted lesser mantissa:
1.11100110110010010011100 000 +
0.00000000000000001000110 001
================================
1.11100110110010011100010 001
Since there was no overflow, and the guard bit is 0, I can use the summation-mantissa and greater-exponent directly (re-removing the implicit '1'):
0 00010001 11100110110010011100010
Giving a final value of:
00001000111100110110010011100010 (1.46487e-33)
But according to my verification implementation, I should be getting:
00001000111100110110010010101000 (1.46487e-33)
So very close but not exact. Is there a mistake in my algorithm?