30
votes

rootn (float_t x, int_t n) is a function that computes the n-th root x1/n and is supported by some programming languages such as OpenCL. When IEEE-754 floating-point numbers are used, efficient low-accuracy starting approximations for any n can be generated based on simple manipulation of the underlying bit pattern, assuming only normalized operands x need to be processed.

The binary exponent of root (x, n) will be 1/n of the binary exponent of x. The exponent field of an IEEE-754 floating-point number is biased. Instead of un-biasing the exponent, dividing it, and re-biasing the result, we can simply divide the biased exponent by n, then apply an offset to compensate for the previously neglected bias. Furthermore, instead of extracting, then dividing, the exponent field, we can simply divide the entire operand x, re-interpreted as an integer. The required offset is trivial to find as an argument of 1 will return a result of 1 for any n.

If we have two helper functions at our disposal, __int_as_float() which reinterpretes an IEEE-754 binary32 as int32, and __float_as_int() which reinterpretes an int32 operand as binary32, we arrive at the following low-accuracy approximation to rootn (x, n) in straightforward fashion:

rootn (x, n) ~= __int_as_float((int)(__float_as_int(1.0f)*(1.0-1.0/n)) + __float_as_int(x)/n)

The integer division __float_as_int (x) / n can be reduced to a shift or multiplication plus shift by well-known optimizations of integer division by constant divisor. Some worked examples are:

rootn (x,  2) ~= __int_as_float (0x1fc00000 + __float_as_int (x) / 2)  // sqrt (x)
rootn (x,  3) ~= __int_as_float (0x2a555556 + __float_as_int (x) / 3)  // cbrt (x)
rootn (x, -1) ~= __int_as_float (0x7f000000 - __float_as_int (x) / 1)  // rcp (x)
rootn (x, -2) ~= __int_as_float (0x5f400000 - __float_as_int (x) / 2)  // rsqrt (x)
rootn (x, -3) ~= __int_as_float (0x54aaaaaa - __float_as_int (x) / 3)  // rcbrt (x)

With all these approximations, the result will be exact only when x = 2n*m for integer m. Otherwise, the approximation will provide an overestimation compared to the true mathematical result. We can approximately halve the maximum relative error by reducing the offset slightly, leading to a balanced mix of underestimation and overestimation. This is easily accomplished by a binary search for the optimal offset that uses all floating-point numbers in the interval [1, 2n) as test cases. Doing so, we find:

rootn (x, 2) ~= __int_as_float (0x1fbb4f2e + __float_as_int(x)/2) // max rel err = 3.47474e-2
rootn (x, 3) ~= __int_as_float (0x2a51067f + __float_as_int(x)/3) // max rel err = 3.15547e-2
rootn (x,-1) ~= __int_as_float (0x7ef311c2 - __float_as_int(x)/1) // max rel err = 5.05103e-2
rootn (x,-2) ~= __int_as_float (0x5f37642f - __float_as_int(x)/2) // max rel err = 3.42128e-2
rootn (x,-3) ~= __int_as_float (0x54a232a3 - __float_as_int(x)/3) // max rel err = 3.42405e-2

Some may notice that the computation for rootn (x,-2) is basically the initial portion of Quake's fast inverse square root.

Based on observing the differences between the original raw offset, and the final offset optimized to minimize the maximum relative error, I could formulate a heuristic for the secondary correction and thus the final, optimized, offset value.

However, I am wondering whether it is possible to determine the optimal offset by some closed-form formula, such that the maximum absolute value of the relative error, max (|(approx(x,n) - x1/n) / x1/n|), is minimized for all x in [1,2n). For ease of exposition, we can restrict to binary32 (IEEE-754 single-precision) numbers.

I am aware that in general, there is no closed-form solution for minimax approximations, however I am under the impression that closed-form solutions do exist for the case of polynomial approximations to algebraic functions like n-th root. In this case we have (piecewise) linear approximation.

1
I'm not totally clear on what your objective function is. You want to minimise average relative error? Worst-cast relative error? Root-mean-square relative error? Number of cases where relative error is above some threshold?tmyklebu
The objective is to minimize the maximum relative error. I will clarify the question.njuffa
I realize I'm side-stepping the question, but you might be happiest with just doing one or two Newton-Raphson steps following the initial approximation. For well-conditioned systems like this, each one essentially doubles the precision.Sneftel
I am fully aware of common approaches to refining starting approximations for n-th root. Side remark: I frequently use iterations with cubic convergence rather than Newton-Raphson which provides only quadratic convergence. I am also aware that the offset needed for a optimal starting approximation using the approach above will change slightly based on the exact nature of the subsequent iteration. However for now I am wondering whether there is a close-formed expression just for an optimal approximation itself, which seems to be a hard enough problem.njuffa
The input where the approximation for the optimal offsets is relatively largest seems to always be slightly less than a power of 2. This suggests to me a plan of attack where we ballpark the offset by considering only the bias adjustment, figure out which power of 2 should be near worst (I would conjecture that this can be computed simply from the approximate offset), then solve some equation where we know what the relevant pieces of the piecewise functions involved should be.David Eisenstat

1 Answers

8
votes

Here's some Octave (MATLAB) code that computes offsets for positive n assuming the conjectures below. Seems to work for 2 and 3, but I suspect that one of the assumptions breaks down when n is too large. No time to investigate right now.

% finds the best offset for positive n
% assuming that the conjectures below are true
function oint = offset(n)
% find the worst upward error with no offset
efrac = (1 / log(2) - 1) * n;
evec = [floor(efrac) ceil(efrac)];
rootnvec = 2 .^ (evec / n);
[abserr i] = max((1 + evec / n) ./ rootnvec);
relerr = abserr - 1;
rootnx = rootnvec(i);
% conjecture: z such that approx(z, n) = 1
% should have the worst downward error
fun = @(o) relerr - o / rootnx + (1 / (1 + o * n) ^ (1 / n) - 1);
oreal = fzero(fun, [0 1]);
oint = round((127 * (1 - 1 / n) - oreal) * 2 ^ 23);

Partial answer for positive n only -- I'm just going to nibble a bit by conjecturing the worst overestimate, assuming that we don't adjust the offset downward.

Let's define an idealized version of the approximation for x in [1, 2^n).

rootn-A(x, n) = 1 + floor(lg(x))/n + ((x/2^floor(lg(x))) - 1) / n
                    ^^^^^^^^^^^^^^   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                    contribution of       contribution of the
                     the exponent             significand

We want to maximize rootn-A(x, n) / x^(1/n).

It seems experimentally that the maximum occurs when x is a power of two. In this case, the significand term is zero and floor(lg(x)) = lg(x), so we can maximize

(1 + lg(x)/n) / x^(1/n).

Substitute y = lg(x)/n, and we can maximize (1 + y) / 2^y for y in [0, 1) such that n*y is an integer. Dropping the integrality condition, it is a calculus exercise to show that the max of this concave function is at y = 1/log(2) - 1, about 0.4426950408889634. It follows that the maximum for x a power of two is at either x = 2^floor((1/log(2) - 1) * n) or x = 2^ceil((1/log(2) - 1) * n). I conjecture that one of these is in fact the global maximum.

On the underestimation end, it appears that we want x such that the output of rootn(x, n) is 1, at least for small n. More to come later hopefully.