Computing RD(sqrt(x)) with a FPU in RU mode

Question

Intervals of floating-point bounds can be used to over-approximate sets of reals, as long as the upper bound of any result interval is computed in round-upwards and the lower bound in round-downwards.

One recommended trick is to actually compute the negation of the lower bound. This allows to keep the FPU in round-upwards at all times (for instance, “Handbook of Floating-Point Arithmetic”, 2.9.2).

This works well for addition and for multiplication. The square root operation, on the other hand, is not symmetrical in the ways addition and multiplication are.

It occurs to me that in order to compute sqrt_RD, for the lower bound, the following idiom, despite its complications, might be faster on an ordinary platform with IEEE 754 double-precision and FLT_EVAL_METHOD defined to 0 than changing the rounding mode twice:

#include <fenv.h>
#include <math.h>
#pragma STDC FENV_ACCESS ON
…
/* assumes round-upwards */
double sqrt_rd(double l) { 
  feclearexcept(FE_INEXACT);
  double candidate = sqrt(l);
  if (fetestexcept(FE_INEXACT))
    return nextafter(candidate, 0);
  return candidate;
}

I am wondering whether this is better, and whether if it is the fastest yet. As one possible alternative, but still not necessarily the fastest, it seems to me that FMA_RU(candidate, candidate, -l) might perhaps not be always exact (because of the directed rounding) but might be accurate enough around 0 for the following to work:

/* assumes round-upwards */
double sqrt_rd(double l) { 
  double candidate = sqrt(l);
  if (fma(candidate, candidate, -l) != 0.0)
    return nextafter(candidate, 0);
  return candidate;
}

What other inexpensive ways are there to detect that sqrt was inexact? What combination of floating-point operations leads to the fastest computation of sqrt_rd on a modern FPU set to round upwards?

I'd suspect that depends on the implementation and the actual environment. — too honest for this site
@Olaf I have updated the question with the information that this is for a platform with IEEE 754 double-precision and FLT_EVAL_METHOD=0. — Pascal Cuoq
That does not help much. "Implementation" is the compiler and "Environment" the target platform/architecture. To show the extremes: There mit be no FPU available or the whole function can result in a single FPU instruction. — too honest for this site
@Olaf Anyone who prefers to benchmark their solution can use a Haswell processor and GCC 5.3.0. I was hoping for something that's so obviously a gain for such a wide class of platforms that it doesn't need to be benchmarked, like the original “compute mult_rd(x, y) as -mult_ru(-x, y)” trick. — Pascal Cuoq

Simon Byrne Simon Byrne · Accepted Answer · 2016-03-16T21:32:25

I think you should be able to use:

/* assumes round-upwards */
double sqrt_rd(double l) { 
  double u = sqrt(l);
  double w = u*u;
  if (w != l)
    return nextafter(u, 0);
  return u;
}

The justification here being that if u is inexact, then it will be strictly greater than √l, which in turn implies that w >= u² > l (since w is also calculated in RU mode). And if u is exact, then so is w (since we know it must be representable as a double).

Computing RD(sqrt(x)) with a FPU in RU mode

2 Answers