Floating Point Modulo Operation

Question

I am trying to implement the range reduction operation for trigonometry. But instead I think it might be better to just perform a modulo pi/2 operation on incoming data. I was wondering what algorithms exist and are efficient for this operation for 32-bit IEEE 754 floating-point?

I have to implement this in assembly, so fmod, division, multiplication, etc. aren't available to me with just one instruction. My processor uses 16-bit words and I have implemented 32-bit floating point addition, subtraction, multiplication, division, square root, cosine, and sine. I just need range reduction (modulus) for inputting values to cosine and sine.

Actually there are a lot of clever algorithms for instance google for "payne hanek range reduction", but I think thats not what you want — Gunther Piez
The paper by Ng you linked to in a previous related question of yours actually explains the Payne-Hanek algorithm, which AFAIK is still the state of the art for accurate range reduction. You just have to adapt it to single precision. — janneb
@Everyone, please delete/edit your answer so that it applies to my actual question. I am looking for the algorithm within a floating-point modulus. I need to implement what fmod does and minimize the number of divides I perform. — Veridian
Thanks - fmod was actually exactly what I was looking for (on another project). — Danny Staple
Please note: any modulo technique involving a floating point approximation will be worthless for larger numbers. If you have an approximation of pi to 16 digits, then accurately dividing a 17-digit number by your approximation can have an error greater than 1, meaning the remainder could be absolutely anywhere in the range 0..pi, revealing nothing about the remainder you were really looking for. — mwfearnley

Michał Kosmulski Michał Kosmulski · Accepted Answer · 2012-02-29T20:01:48

I think standard library's fmod() will be the best choice in most cases. Here's a link to a discussion of several simple algorithms.

On my machine, fmod() uses optimized inline assembly code (/usr/include/bits/mathinline.h):

#if defined __FAST_MATH__ && !__GNUC_PREREQ (3, 5)
__inline_mathcodeNP2 (fmod, __x, __y, \
  register long double __value;                           \
  __asm __volatile__                                  \
    ("1:    fprem\n\t"                            \
     "fnstsw    %%ax\n\t"                             \
     "sahf\n\t"                                   \
     "jp    1b"                               \
     : "=t" (__value) : "0" (__x), "u" (__y) : "ax", "cc");           \
  return __value)
#endif

So it actually uses a dedicated CPU instruction (fprem) for the calculation.

Floating Point Modulo Operation

4 Answers