I have written the following routine, which is supposed to truncate a C++ double at the n'th decimal place.
double truncate(double number_val, int n)
{
double factor = 1;
double previous = std::trunc(number_val); // remove integer portion
number_val -= previous;
for (int i = 0; i < n; i++) {
number_val *= 10;
factor *= 10;
}
number_val = std::trunc(number_val);
number_val /= factor;
number_val += previous; // add back integer portion
return number_val;
}
Usually, this works great... but I have found that with some numbers, most notably those that do not seem to have an exact representation within double, have issues.
For example, if the input is 2.0029, and I want to truncate it at the fifth place, internally, the double appears to be stored as something somewhere between 2.0028999999999999996 and 2.0028999999999999999, and truncating this at the fifth decimal place gives 2.00289, which might be right in terms of how the number is being stored, but is going to look like the wrong answer to an end user.
If I were rounding instead of truncating at the fifth decimal, everything would be fine, of course, and if I give a double whose decimal representation has more than n digits past the decimal point it works fine as well, but how do I modify this truncation routine so that inaccuracies due to imprecision in the double type and its decimal representation will not affect the result that the end user sees?
I think I may need some sort of rounding/truncation hybrid to make this work, but I'm not sure how I would write it.
Edit: thanks for the responses so far but perhaps I should clarify that this value is not producing output necessarily but this truncation operation can be part of a chain of many different user specified actions on floating point numbers. Errors that accumulate within the double precision over multiple operations are fine, but no single operation, such as truncation or rounding, should produce a result that differs from its actual ideal value by more than half of an epsilon, where epsilon is the smallest magnitude represented by the double precision with the current exponent. I am currently trying to digest the link provided by iinspectable below on floating point arithmetic to see if it will help me figure out how to do this.
Edit: well the link gave me one idea, which is sort of hacky but it should probably work which is to put a line like number_val += std::numeric_limits<double>::epsilon()
right at the top of the function before I start doing anything else with it. Dunno if there is a better way, though.
Edit: I had an idea while I was on the bus today, which I haven't had a chance to thoroughly test yet, but it works by rounding the original number to 16 significant decimal digits, and then truncating that:
double truncate(double number_val, int n)
{
bool negative = false;
if (number_val == 0) {
return 0;
} else if (number_val < 0) {
number_val = -number_val;
negative = true;
}
int pre_digits = std::log10(number_val) + 1;
if (pre_digits < 17) {
int post_digits = 17 - pre_digits;
double factor = std::pow(10, post_digits);
number_val = std::round(number_val * factor) / factor;
factor = std::pow(10, n);
number_val = std::trunc(number_val * factor) / factor;
} else {
number_val = std::round(number_val);
}
if (negative) {
number_val = -number_val;
}
return number_val;
}
Since a double precision floating point number only can have about 16 digits of precision anyways, this just might work for all practical purposes, at a cost of at most only one digit of precision that the double would otherwise perhaps support.
I would like to further note that this question differs from the suggested duplicate above in that a) this is using C++, and not Java... I don't have a DecimalFormatter convenience class, and b) I am wanting to truncate, not round, the number at the given digit (within the precision limits otherwise allowed by the double datatype), and c) as I have stated before, the result of this function is not supposed to be a printable string... it is supposed to be a native floating point number that the end user of this function might choose to further manipulate. Accumulated errors over multiple operations due to imprecision in the double type are acceptable, but any single operation should appear to perform correctly to the limits of the precision of the double datatype.
1.23
cannot be represented exactly in binary floating-point. The only time this kind of truncation makes sense is when you're generating a human-readable string, such as"1.23"
, from a floating-point value like1.2345
. – Keith Thompson2.0028999999999999996
, was it an inexact representation of2.0029
or is it an exact representation of2.0028999999999999996
or is it something in between? Computer has no way to do known that. At best you can truncate a floating point number to specified binary digit. You cannot do it for decimal digits. – Yan Zhou