This topic has come up many times on StackOverflow, but I believe this is a new take. Yes, I have read Bruce Dawson's articles and What Every Computer Scientist Should Know About Floating-Point Arithmetic and this nice answer.
As I understand it, on a typical system there are four basic problems when comparing floating-point numbers for equality:
- Floating point calculations are not exact
- Whether
a-b
is "small" depends on the scale ofa
andb
- Whether
a-b
is "small" depends on the type ofa
andb
(e.g. float, double, long double) - Floating point typically has +-infinity, NaN, and denormalized representations, any of which can interfere with a naïve formulation
This answer -- aka. "the Google approach" -- seems to be popular. It does handle all of the tricky cases. And it does scale the comparison very precisely, checking whether two values are within a fixed number of ULPs of each other. Thus, for example, a very large number compares "almost equal" to infinity.
However:
- It is very messy, in my opinion.
- It is not particularly portable, relying heavily on internal representations, using a union to read the bits from a float, etc.
- It only handles single-precision and double-precision IEEE 754 (in particular, no x86 long double)
I want something similar, but using standard C++ and handling long doubles. By "standard", I mean C++03 if possible and C++11 if necessary.
Here is my attempt.
#include <cmath>
#include <limits>
#include <algorithm>
namespace {
// Local version of frexp() that handles infinities specially.
template<typename T>
T my_frexp(const T num, int *exp)
{
typedef std::numeric_limits<T> limits;
// Treat +-infinity as +-(2^max_exponent).
if (std::abs(num) > limits::max())
{
*exp = limits::max_exponent + 1;
return std::copysign(0.5, num);
}
else return std::frexp(num, exp);
}
}
template<typename T>
bool almostEqual(const T a, const T b, const unsigned ulps=4)
{
// Handle NaN.
if (std::isnan(a) || std::isnan(b))
return false;
typedef std::numeric_limits<T> limits;
// Handle very small and exactly equal values.
if (std::abs(a-b) <= ulps * limits::denorm_min())
return true;
// frexp() does the wrong thing for zero. But if we get this far
// and either number is zero, then the other is too big, so just
// handle that now.
if (a == 0 || b == 0)
return false;
// Break the numbers into significand and exponent, sorting them by
// exponent.
int min_exp, max_exp;
T min_frac = my_frexp(a, &min_exp);
T max_frac = my_frexp(b, &max_exp);
if (min_exp > max_exp)
{
std::swap(min_frac, max_frac);
std::swap(min_exp, max_exp);
}
// Convert the smaller to the scale of the larger by adjusting its
// significand.
const T scaled_min_frac = std::ldexp(min_frac, min_exp-max_exp);
// Since the significands are now in the same scale, and the larger
// is in the range [0.5, 1), 1 ulp is just epsilon/2.
return std::abs(max_frac-scaled_min_frac) <= ulps * limits::epsilon() / 2;
}
I claim that this code (a) handles all of the relevant cases, (b) does the same thing as the Google implementation for IEEE-754 single- and double-precision, and (c) is perfectly standard C++.
One or more of these claims is almost certainly wrong. I will accept any answer that demonstrates such, preferably with a fix. A good answer should include one or more of:
- Specific inputs differing by more than
ulps
Units in Last Place, but for which this function returns true (the bigger the difference, the better) - Specific inputs differing by up to
ulps
Units in Last Place, but for which this function returns false (the smaller the difference, the better) - Any case(s) I have missed
- Any way in which this code relies on undefined behavior or breaks depending on implementation-defined behavior. (If possible, please cite a relevant spec.)
- Fixes for whatever problem(s) you identify
- Any way to simplify the code without breaking it
I intend to place a non-trivial bounty on this question.
factor * limits::epsilon / 2
convertsfactor
to the floating-point type, which causes rounding errors for large values offactor
that are not exactly representable. Likely, the routine is not intended to be used with such large values (millions of ULPs infloat
), so this ought to be specified as a limit on the routine rather than a bug. (Of course, the shift1U << ulps
is also subject to error whenulps
is large, but this is being removed.) – Eric Postpischil