22
votes

I am trying to implement the range reduction operation for trigonometry. But instead I think it might be better to just perform a modulo pi/2 operation on incoming data. I was wondering what algorithms exist and are efficient for this operation for 32-bit IEEE 754 floating-point?

I have to implement this in assembly, so fmod, division, multiplication, etc. aren't available to me with just one instruction. My processor uses 16-bit words and I have implemented 32-bit floating point addition, subtraction, multiplication, division, square root, cosine, and sine. I just need range reduction (modulus) for inputting values to cosine and sine.

4
Actually there are a lot of clever algorithms for instance google for "payne hanek range reduction", but I think thats not what you want - Gunther Piez
The paper by Ng you linked to in a previous related question of yours actually explains the Payne-Hanek algorithm, which AFAIK is still the state of the art for accurate range reduction. You just have to adapt it to single precision. - janneb
@Everyone, please delete/edit your answer so that it applies to my actual question. I am looking for the algorithm within a floating-point modulus. I need to implement what fmod does and minimize the number of divides I perform. - Veridian
Thanks - fmod was actually exactly what I was looking for (on another project). - Danny Staple
Please note: any modulo technique involving a floating point approximation will be worthless for larger numbers. If you have an approximation of pi to 16 digits, then accurately dividing a 17-digit number by your approximation can have an error greater than 1, meaning the remainder could be absolutely anywhere in the range 0..pi, revealing nothing about the remainder you were really looking for. - mwfearnley

4 Answers

17
votes

I think standard library's fmod() will be the best choice in most cases. Here's a link to a discussion of several simple algorithms.

On my machine, fmod() uses optimized inline assembly code (/usr/include/bits/mathinline.h):

#if defined __FAST_MATH__ && !__GNUC_PREREQ (3, 5)
__inline_mathcodeNP2 (fmod, __x, __y, \
  register long double __value;                           \
  __asm __volatile__                                  \
    ("1:    fprem\n\t"                            \
     "fnstsw    %%ax\n\t"                             \
     "sahf\n\t"                                   \
     "jp    1b"                               \
     : "=t" (__value) : "0" (__x), "u" (__y) : "ax", "cc");           \
  return __value)
#endif

So it actually uses a dedicated CPU instruction (fprem) for the calculation.

16
votes

Maybe I'm missing the point here, but do you have anything against simply using fmod?

double theta = 10.4;
const double HALF_PI = 2 * atan(1);
double result = fmod(theta, HALF_PI);
9
votes

The algorithm you want, to limit a floating point value between 0 and some modulus n:

Double fmod(Double value, Double modulus)
{
    return value - Trunc(value/modulus)*modulus;
}

for example pi mod e (3.14159265358979 mod 2.718281828459045)

3.14159265358979 / 2.718281828459045 
   = 1.1557273497909217179

Trunc(1.1557273497909217179)
   = 1

1.1557273497909217179 - 1
   = 0.1557273497909217179

0.1557273497909217179 * e
   = 0.1557273497909217179 * 2.718281828459045
   = 0.42331082513074800

pi mod e = 0.42331082513074800

0
votes

Exact fmod is implemented with long division. The exact remainder is always representable as the dividend and the divisor share the same format. You can look into open-source implementations like glibc and musl. I have also made one in metallic. (shameless plug)

Payne–Hanek range reduction is for constant divisors like π, whose reciprocal we store in advance. Hence, it is not applicable here.