Describing how each of these flags affect each of the math function would require too much work, I'll try to give an example for each instead.
Leaving to you the burden to see how each could affect a given function.
-fno-signed-zeros
Assumes that your code doesn't depend on the sign of zero.
In FP arithmetic zero is not an absorbing element w.r.t. the multiplication: 0 · x = x · 0 ≠ 0 because zero has a sign and thus, for example -3 · 0 = -0 ≠ 0 (Where 0 usually denotes +0).
You can see this live on Godbolt where a multiplication by zero is unfolded to a constant zero only with -Ofast
float f(float a)
{
return a*0;
}
;With -Ofast
f(float): # @f(float)
xorps xmm0, xmm0
ret
;With -O3
f(float): # @f(float)
xorps xmm1, xmm1
mulss xmm0, xmm1
ret
A EOF noted in the comments this also depends on finite arithmetic.
-freciprocal-math
Use reciprocals instead of divisors: a/b = a · (1/b).
Due to the limitedness of FP precision, the equal sign is really not there.
Multiplication is faster than division, see Fog's tables.
See also why-is-freciprocal-math-unsafe-in-gcc?.
Live example on Godbolt:
float f(float a){
return a/3;
}
;With -Ofast
.LCPI0_0:
.long 1051372203 # float 0.333333343
f(float): # @f(float)
mulss xmm0, dword ptr [rip + .LCPI0_0]
ret
;With -O3
.LCPI0_0:
.long 1077936128 # float 3
f(float): # @f(float)
divss xmm0, dword ptr [rip + .LCPI0_0]
ret
-ffp-contract=fast
Enable contraction of FP expression.
Contraction is an umbrella term for any law you can apply in the field ℝ that results in a simplified expression.
For example, a * k / k = a.
However, the FP numbers set equipped with + and · is not a field in general due to finite precision.
This flag allows the compiler to contract FP expression at the cost of correctness.
Live example on Godbolt:
float f(float a){
return a/3*3;
}
;With -Ofast
f(float): # @f(float)
ret
;With -O3
.LCPI0_0:
.long 1077936128 # float 3
f(float): # @f(float)
movss xmm1, dword ptr [rip + .LCPI0_0] # xmm1 = mem[0],zero,zero,zero
divss xmm0, xmm1
mulss xmm0, xmm1
ret
-menable-unsafe-fp-math
Kind of the above but in a broader sense.
Enable optimizations that make unsafe assumptions about IEEE math (e.g. that addition is associative) or may not work for all input ranges. These optimizations allow the code generator to make use of some instructions which would otherwise not be usable (such as fsin
on X86).
See this about the error precision of the fsin
instruction.
Live example at Godbolt where a4 is exanded into (a2/sup>)2:
float f(float a){
return a*a*a*a;
}
f(float): # @f(float)
mulss xmm0, xmm0
mulss xmm0, xmm0
ret
f(float): # @f(float)
movaps xmm1, xmm0
mulss xmm1, xmm1
mulss xmm1, xmm0
mulss xmm1, xmm0
movaps xmm0, xmm1
ret
-menable-no-nans
Assumes the code generates no NaN values.
In a previous answer of mine I analysed how ICC dealt with complex number multiplication by assuming no NaNs.
Most of the FP instruction deals with NaNs automatically.
There are exceptions though, such as comparisons, this can be seen in this live at Godbolt
bool f(float a, float b){
return a<b;
}
;With -Ofast
f(float, float): # @f(float, float)
ucomiss xmm0, xmm1
setb al
ret
;With -O3
f(float, float): # @f(float, float)
ucomiss xmm1, xmm0
seta al
ret
Note that the two versions are not equivalent as the -O3 one exluded the case where a
and b
are unordered while the other one include it in the true
result.
While the performance is the same in this case, in complex expression this asymmetry can lead to different unfolding/optimisations.
-menable-no-infs
Just like the above but for infinities.
I was unable to reproduce a simple example in Godbolt but the trigonometric functions need to deal with infinities carefully, especially for complex numbers.
If you browse the a glibc implementation's math dir (e.g. sinc) you'll see a lot of checks that should be omitted on compilation with -Ofast
.
-O3 -ffast-math
. Also, compile-time-constants work differently from the behaviour for runtime-variable values with-ffast-math
. An FP multiply will still compile to something likemulss
with-ffast-math
, and the hardware will still produce NaN in theinf * 0.0
case. What you're seeing is a compile-time optimization ofanything * 0.0 => 0.0
– Peter Cordes