3
votes

Will printf('%.9e', value) always print the exact base10 representation of value if value is an IEEE single precision floating-point number (C/C++ float)?

Will the same hold for printf('%.17e', value) if value is an IEEE double precision floating-point number (C/C++ double)?

If not, how can I?

It appears that printf('%.17f', value) and printf('%.17g', value) will not.

2
I think every base2 number can be represented exactly as a base10 number, so I think both. I know that not every base10 number can be represented exactly as a base2 number, but I'm not concerned about that. I'm assuming that the number already exists as a base2 number in a float or double. I'm not actually sure how to show an example.Patrick
An ieee754 single-precision float has 23 bits of precision, and 10 only has a single power of two factor, so I expect it's possible to find a single-precision float that takes 23 significant decimal digits to represent exactly.EOF
@ThomasMatthews: How does that relate to the question?Oliver Charlesworth
@OliverCharlesworth: It explains how the exact base 10 representation of any value is interpreted, emphasis on exact.Thomas Matthews

2 Answers

3
votes

Will printf('%.9e', value) always print the exact base10 representation?

No. Consider 0.5, 0.25, 0.125, 0.0625 .... Each value is one-half the preceding and needs another decimal place for each decremented power of 2.

float, often binary32 can represent values about pow(2,-127) and sub-normals even smaller. It would take 127+ decimal places to represent those exactly. Even counting only significant digits, then number is 89+. Example FLT_MIN on one machine is exactly

0.000000000000000000000000000000000000011754943508222875079687365372222456778186655567720875215087517062784172594547271728515625

FLT_TRUE_MIN, the smallest non-zero sub-normal is 151 digits:

0.00000000000000000000000000000000000000000000140129846432481707092372958328991613128026194187651577175706828388979108268586060148663818836212158203125

By comparison, FLT_MAX only takes 39 digits.

340282346638528859811704183484516925440

Rarely are exact decimal representation of float needed. Printing them to FLT_DECIMAL_DIG (typically 9) significant digits is sufficient to uniquely display them. Many systems do not print exact decimal representation beyond a few dozen significant digits.

Vast majority of systems I have used printed float/double exactly to at least DBL_DIG significant digits (typically 15+). Most systems do so at least to DBL_DECIMAL_DIG (typically 17+) significant digits.

Printf width specifier to maintain precision of floating-point value gets into these issues.

printf('%.*e', FLT_DECIMAL_DIG - 1, value) will print a float to enough decimals places to scan it back and get the same value - (round-trip).

2
votes

The IEEE-754 format for a 32-bit floating point number is explained in this Wikipedia article.

The following table shows the bit weights for each bit, given that the exponent is 0, meaning
1.0 <= N < 2.0. The last number in the table is the largest number less than 2.0.

From the table, you can see that you need to print at least 23 digits after the decimal point to get the exact decimal number from a 32-bit floating point number.

3f800000 1.0000000000000000000000000   (1)
3fc00000 1.5000000000000000000000000   (1 + 2^-1)
3fa00000 1.2500000000000000000000000   (1 + 2^-2)
3f900000 1.1250000000000000000000000   (1 + 2^-3)
3f880000 1.0625000000000000000000000   (1 + 2^-4)
3f840000 1.0312500000000000000000000   (1 + 2^-5)
3f820000 1.0156250000000000000000000   (1 + 2^-6)
3f810000 1.0078125000000000000000000   (1 + 2^-7)
3f808000 1.0039062500000000000000000   (1 + 2^-8)
3f804000 1.0019531250000000000000000   (1 + 2^-9)
3f802000 1.0009765625000000000000000   (1 + 2^-10)
3f801000 1.0004882812500000000000000   (1 + 2^-11)
3f800800 1.0002441406250000000000000   (1 + 2^-12)
3f800400 1.0001220703125000000000000   (1 + 2^-13)
3f800200 1.0000610351562500000000000   (1 + 2^-14)
3f800100 1.0000305175781250000000000   (1 + 2^-15)
3f800080 1.0000152587890625000000000   (1 + 2^-16)
3f800040 1.0000076293945312500000000   (1 + 2^-17)
3f800020 1.0000038146972656250000000   (1 + 2^-18)
3f800010 1.0000019073486328125000000   (1 + 2^-19)
3f800008 1.0000009536743164062500000   (1 + 2^-20)
3f800004 1.0000004768371582031250000   (1 + 2^-21)
3f800002 1.0000002384185791015625000   (1 + 2^-22)
3f800001 1.0000001192092895507812500   (1 + 2^-23)

3fffffff 1.9999998807907104492187500

One thing to note about this is that there are only 2^23 (about 8 million) floating point values between 1 and 2. However, there are 10^23 numbers with 23 digits after the decimal point, so very few decimal numbers have exact floating point representations.

As a simple example, the number 1.1 does not have an exact representation. The two 32-bit float values closest to 1.1 are

3f8ccccc 1.0999999046325683593750000
3f8ccccd 1.1000000238418579101562500