6
votes

Consider decimal representations of the form d1.d2d3d4d5...dnExxx where xxx is an arbitrary exponent and both d1 and dn are nonzero.

Is the maximum n known such that there exists a decimal representation d1.d2d3d4d5...dnExxx such that the interval (d1.d2d3d4d5...dnExxx, d1.d2d3d4d5...((dn)+1)Exxx) contains an IEEE 754 double?

n should be at least 17. The question is how much above 17.

This number n has something to do with the number of digits that it is enough to consider in a decimal-to-double conversion such as strtod(). I looked at the source code for David M. Gay's implementation hoping to find an answer there. There is an allusion to “40” but it is unclear whether this is a consequence of a sound mathematical result or just a statistically safe bound. Also the comment about “truncating” makes it sound like 0.5000000000000000000000000000000000000000000000000001 may be converted to 0.5 in round-upwards mode.

Musl's implementation seems to read approximately 125*9 digits, which is a lot. Then it switches to “sticky” mode:

if (c!='0') x[KMAX-4] |= 1;

Finally, how does the answer change when substituting “contains an IEEE 754 double” with “contains the midpoint of two consecutive IEEE 754 doubles”?

1
I'm not sure I understand the point. For example, 2^(-1074) has 751 significant decimal digits, thus there is a decimal representation d1.d2...d750E-324 satisfying the condition (you can get longer ones, but not much). But you only need a handful of these digits to determine the closest IEEE754 double.Daniel Fischer
@DanielFischer But 2^(-1074) is not in the exclusive interval (d1.d2...d750E-324, d1.d2...(d750+1)E-324). By the way (d750+1) is a slight abuse of notation if d750 is a “9”. This abuse is also in my question but the alternative (d1.d2...d750E-324, (d1.d2...d750E-324 + 1E-1074)) is confusing too. I may even get the exponent wrong.Pascal Cuoq
I used one digit less than the exact representation has, so it is in the open interval.Daniel Fischer
I guess your problem is that you want/need to efficiently parse in rounding modes other than "to nearest", right?Daniel Fischer
@DanielFischer I tried to write the question so that the first part, up to the last sentence, was about the conversion in directed modes and the last sentence about the conversion in round-to-nearest. Now it seems to me that the two answers differ widely (750+ and 17) but I still think I somehow managed to ask the right questions.Pascal Cuoq

1 Answers

6
votes

When you have a subnormal number with odd significand, that is, an odd multiple of 2^(-1074), you have a number whose last nonzero digit in the decimal representation is the 1074th after the decimal point. You then have around 300 zeros directly following the decimal point, so the number has around 750-770 significant decimal digits. The smallest positive subnormal, 2^(-1074) has 751 significant digits, and the largest positive subnormal, (2^52-1)*2^(-1074) has 767 significant digits (I think that is the maximum).

So there is at least one sequence d1, ..., d766 of decimal digits such that there is an IEEE754 double in the open interval

(d1.d2...d766E-308, d1.d2...(d766 + 1)E-308)

The answer does not change much if we consider "contains the midpoint of two consecutive IEEE754 doubles", since subnormal doubles have all roughly the same amount of significant decimal digits, and the midpoint of two consecutive such too.

In the worst case, the entire digit sequence must be consumed (consider "0.5000000...0001" with arbitrarily many zeros before the final 1 that determines that the result shall be 0.5 + 0.5^53 and not 0.5 when rounding away from zero or up).

However, there are only

floor(DBL_MANT_DIG * log 2 / log 10) + 2 = 17

significant decimal digits necessary to distinguish between different double values, so a relatively easy, albeit probably not very efficient, method of parsing would be to parse the first (at least 17) digits (and the exponent) to the closest double, and compare the input string with the exact representation of that double value (and its neighbour).