0
votes

I think I know how to convert a decimal number into IEEE 754 single-precision floating-point representation, but I want to make sure.

I want to represent 3.398860921 x 10^18 in IEEE 754 single-precision floating-point representation. I know how float rep. is broken down.

31th digit: sign (0 for + and 1 for -) 30-23th digits: represent the exponent 22-0th digits: represent the mantissa (significand)

so sign is obviously 0 since it's a positive number. For the exponent I came up with this (by adding 18 to 127 for the bias) and represented the exponent as: 1001 0001

For the mantissa which would be the 3.398860921 part, I continuously multiplied everything to the right of the decimal by 2, and if it was greater than 1 I marked a 1, otherwise a 0. Then took the new answer and again multiplied everything to the right of the decimal by 2, until I came up with enough bits to fill the mantissa.

So now I have: 0 | 1001 0001 | 0110 0110 0001 1011 1011 111

so when I convert this into HEX, I get 0x48B30DDF but that is a different number than I began with in the 3.398860921 x 10^18

Is that supposed to be like that or did I make a mistake somewhere? Any help would be greatly appreciated.

1
Why "adding 18"? The exponent is a power of two, not ten.Patricia Shanahan
If you only need to do this for numbers that are actually integers, it may be easier to first get the decimal integer string representation, convert it to binary, and then pack the binary number into float.Patricia Shanahan
1) So what should I be adding instead of 18 to 127 then? 2) So do you mean "3.398860921000000000" and convert that into binary?user2516663
1) The correct thing to add is the number of bits by which the binary value needs to be shifted to get it into the 1.xxxx form. 2) See Pascal Cuoq's answer.Patricia Shanahan

1 Answers

1
votes

You cannot use the decimal exponent for the IEEE 754 representation. The IEEE 754 expects a binary exponent, that is, the number p when the number is represented as 1.xxx… * 2p.

And you cannot use what is the mantissa from the decimal scientific notation directly converted to binary, since it only makes sense in relation to the decimal exponent, that you cannot use directly.

The algorithm is to convert the entire number to binary and then, and then only, for the significand, to take the 23 bits that follow the leading bit. For the exponent, count the position of the leading bit.

For your particular value of 3.398860921 x 1018, the binary representation is 1.0111100101011001011011111111111101101001010010111101×261 according to Wolfram Alpha.

This means that the unbiased exponent is 61 and a tentative significand with leading bit omitted is 01111001010110010110111. You can compute the error of the conversion from decimal to floating-point as 0.0000000000000000000000011111111101101001010010111101×261, and since this error is larger than half the ULP, you should, unless you have reasons to prefer to round downwards, add one to the significand in order to obtain the nearest single-precision value to the original number as expressed in decimal.