The relative rounding error for a floating point number x is defined as
e_r = |(round(x) - x) / x| = |round(x)/x - 1| (1)
Assuming that the rounding to nearest mode is used for round(x)
, the absolute rounding error |round(x) - x|
is going to be less than 0.5 ulp(x)
, where the ulp
are units in the last place
ulp = 2^E * epsilon
and E is the exponent used for x
, and epsilon
is the machine precision epsilon=2^-(p-1)
, p
is precision (24 for the single precision and 53 for the double precision IEEE formats).
Using this, the relative error can be expressed for any real number x
e_r = |(round(x) - x) / x| = |(round(x) - x)| / |x| < |0.5 * 2^E * 2^-(p-1)| / |2^E| < 0.5 epsilon
The problem is, that for denormalized numbers 0 < x < 2^Em
, where Em
is the minimal exponent (-126 for single precision, -1022 for double), that are
0 < x <= 0.5 * epsilon * 2^Em
the rounding always goes to 0!
If the round(x)
is 0, then by (1)
e_r =|(0 - x) / x| = |1| !
How is the relative error computed for such numbers? Should the relative error be even used for the numbers that are rounded to 0
?
round(x)
is 0, then “e_r =|(0 - 1) / 1| = |1| !” expresses the fact that when a computed result is zero and the ideal mathematical result is non-zero, the relative error is 100%. This is correct. Non-zero numbers that are rounded to zero have a relative error of 100%. – Eric Postpischila + b
, the ideal mathematical result isa
+b
, that is, the result of addinga
andb
with normal mathematics. Conceptually, this produces an intermediate result which is then rounded to the nearest representable value. That representable value is delivered by the computer to a destination (such as a processor register). That delivered value is the computed result. – Eric Postpischil