0
votes

The relative rounding error for a floating point number x is defined as

e_r = |(round(x) - x) / x| = |round(x)/x - 1| (1)

Assuming that the rounding to nearest mode is used for round(x), the absolute rounding error |round(x) - x| is going to be less than 0.5 ulp(x), where the ulp are units in the last place

ulp = 2^E * epsilon

and E is the exponent used for x, and epsilon is the machine precision epsilon=2^-(p-1), p is precision (24 for the single precision and 53 for the double precision IEEE formats).

Using this, the relative error can be expressed for any real number x

e_r = |(round(x) - x) / x| = |(round(x) - x)| / |x| < |0.5 * 2^E * 2^-(p-1)| / |2^E| < 0.5 epsilon

The problem is, that for denormalized numbers 0 < x < 2^Em, where Em is the minimal exponent (-126 for single precision, -1022 for double), that are

0 < x <= 0.5 * epsilon * 2^Em

the rounding always goes to 0!

If the round(x) is 0, then by (1)

e_r =|(0 - x) / x| = |1| !

How is the relative error computed for such numbers? Should the relative error be even used for the numbers that are rounded to 0?

1
Given round(x) is 0, then “e_r =|(0 - 1) / 1| = |1| !” expresses the fact that when a computed result is zero and the ideal mathematical result is non-zero, the relative error is 100%. This is correct. Non-zero numbers that are rounded to zero have a relative error of 100%.Eric Postpischil
@EricPostpischil: the computed result is not zero, it is 0 < x <= 0.5 epsilon 2^Em. It rounds to 0, because of the nearest rounding rule, and then the relative error skyrockets from e_r < 0.5 epsilon to 1. I am interested to learn if the special case when the underflow to 0 happens can be handled in my code without checking for equality to 0 of the rounded number.tmaric
The computed result is the result delivered after rounding. If you do an operation a + b, the ideal mathematical result is a + b, that is, the result of adding a and b with normal mathematics. Conceptually, this produces an intermediate result which is then rounded to the nearest representable value. That representable value is delivered by the computer to a destination (such as a processor register). That delivered value is the computed result.Eric Postpischil
You have not stated what your code does, so there is no way to know whether underflow to zero can be handled in a way suitable to whatever the goals are for your code. This Stack Overflow question asks about the relative error of a number that is rounded to zero. The relative error is 100%, if the ideal number is not zero. Figuring out how this affects your code and how to handle it is a different question. Perhaps you should be asking that question instead.Eric Postpischil
This is independent of the arithmetical operation. Given a real number x that is below the minimal denormalized number, round the number to its nearest floating point and compute the relative error. If this error is 1 for all such numbers, and 0.5 epsilon for all other numbers in the normalized range, I have to handle this as a special case, in every program code that bases its decisions on the relative rounding error.tmaric

1 Answers

0
votes

When the exact mathematical result of an operation is non-zero, and the final result the computer delivers for the operation is zero, the relative error is 100%.

The formula er = |(0 − x) / x| = |1|, where x is non-zero, correctly expresses this.

Regarding the question “Should the relative error be even used for the numbers that are rounded to 0?”, the suitability of relative error as a metric depends on the application. If the delivered result has lost all information useful to the application, this is reflected in the fact that the relative error is 100%. If the delivered result has some use to the application, as perhaps it is useful to know the result is small, whereas other results are much larger, then the relative error may not be relevant. A specific answer cannot be provided without more information about the application.