Floating point number comparison trick: inline assembly

Question

A long time ago, I've used this simple x86 assembler trick to obtain 0 or 1 as a result of floating point number comparison:

fld [value1]
fcom [value2]
fnstsw ax
mov al, ah
and eax, 1

This trick allows to avoid branching if comparison result only affects selection of a value from a set of 2 values. It was fast in Pentium days, now it may not be so much faster, but who knows.

Now I mainly use C++ and compile using Intel C++ Compiler or GCC C++ Compiler.

Can someone please help rewrite this code into 2 built-in assembler flavors (Intel and GCC).

The required function prototype is: inline int compareDoublesIndexed( const double value1, const double value2 ) { ... }

Maybe using SSE2 operations could be even more efficient. Your perspective?

I've tried this:

__asm__(
    "fcomq %2, %0\n"
    "fnstsw %ax\n"
    "fsubq %2, %0\n"
    "andq $L80, %eax\n"
    "shrq $5, %eax\n"
    "fmulq (%3,%eax), %0\n"
    : "=f" (penv)
    : "0" (penv), "F" (env), "r" (c)
    : "eax" );

But I get error in Intel C++ Compiler: Floating point output constraint must specify a single register.

Do you have any context? Why asm, what are your perf constraints? — David Heffernan
I need to perform tons of such 2-number comparisons in recursive filters (DSP), so I can't use SIMD instructions at all. The basic example is: env += ( penv - env ) * ( penv <= env ? envca : envcb ); — aleksv

amdn amdn · Accepted Answer · 2014-03-09T13:17:56

As you mentioned, things have changed since the Pentium days:

SSE is now the preferred instruction set for floating point instead of x87, even for scalar operations
optimizing compilers are now very good

Therefore first check what the compiler generates, you might be pleasantly surprised. I tried g++ with -O3 on the following code

fcmp.cpp:

int compareDoublesIndexed( const double value1, const double value2 ) {
    return value1 < value2 ? 1 : 0;
}

This is what the compiler generated

0000000000400690 <_Z21compareDoublesIndexeddd>:
  400690:       31 c0                   xor    %eax,%eax
  400692:       66 0f 2e c8             ucomisd %xmm0,%xmm1
  400696:       0f 97 c0                seta   %al
  400699:       c3                      retq

This is what it means

  xor     %eax,%eax        ; EAX = 0
  ucomisd %xmm0,%xmm1      ; compare value2 (in %xmm1) with value1 (in %xmm0)
  seta    %al              ; AL = value2 > value1 ? 1 : 0

So the compiler avoided the conditional branch by using the seta instruction (set byte to '1' if result is above, to '0' otherwise).

Floating point number comparison trick: inline assembly

1 Answers