60
votes

This question demonstrates a very interesting phenomenon: denormalized floats slow down the code more than an order of magnitude.

The behavior is well explained in the accepted answer. However, there is one comment, with currently 153 upvotes, that I cannot find satisfactory answer to:

Why isn't the compiler just dropping the +/- 0 in this case?!? – Michael Dorgan

Side note: I have the impression that 0f is/must be exactly representable (furthermore - it's binary representation must be all zeroes), but can't find such a claim in the c11 standard. A quote proving this, or argument disproving this claim, would be most welcome. Regardless, Michael's question is the main question here.


§5.2.4.2.2

An implementation may give zero and values that are not floating-point numbers (such as infinities and NaNs) a sign or may leave them unsigned.

2
This is answered in one of the last comments in the answer to the linked question: "@s73v3r: The +0.f cannot be optimized out because floating-point has a negative 0, and the result of adding +0.f to -.0f is +0.f. So adding 0.f is not an identity operation and cannot be optimized out. – Eric Postpischil"Michael Burr
And to be clear - it's not the +0.f or -0.f that are denormalized - it's the value in the array that zero is being added to that is denormalized (and causing the slowdown).Michael Burr
I don't think the edit changes anything. The implementation of floating point being used by MSVC uses signed zeros. That may not be required by the C standard, but might be required by IEEE 754 (I honestly don't know). However, the /fp:fast option might cause the compiler to optimize +0.f - I don't know.Michael Burr
I don't think the C or C++ standards specify how a floating point zero should be represented. However, my understanding is that IEEE 754 specifies that zero is represented by all zero bits (except for the sign bit in the case of negative zero). But I'm very far from an expert on floating point, and know next to nothing about the details of the IEEE standard. So, what I say in this comment probably isn't very useful.Michael Burr
Now it has 153 votes.L. F.

2 Answers

64
votes

The compiler cannot eliminate the addition of a floating-point positive zero because it is not an identity operation. By IEEE 754 rules, the result of adding +0. to −0. is not −0.; it is +0.

The compiler may eliminate the subtraction of +0. or the addition of −0. because those are identity operations.

For example, when I compile this:

double foo(double x) { return x + 0.; }

with Apple GNU C 4.2.1 using -O3 on an Intel Mac, the resulting assembly code contains addsd LC0(%rip), %xmm0. When I compile this:

double foo(double x) { return x - 0.; }

there is no add instruction; the assembly merely returns its input.

So, it is likely the code in the original question contained an add instruction for this statement:

y[i] = y[i] + 0;

but contained no instruction for this statement:

y[i] = y[i] - 0;

However, the first statement involved arithmetic with subnormal values in y[i], so it was sufficient to slow down the program.

3
votes

It is not the zero constant 0.0f that is denormalized, it is the values that approach zero each iteration of the loop. As they become closer and closer to zero, they need more precision to represent, hence the denormalization. In the original question, these are the y[i] values.

The crucial difference between the slow and fast versions of the code is the statement y[i] = y[i] + 0.1f;. As soon as this line is executed, the extra precision in the float is lost, and the denormalization needed to represent that precision is no longer needed. Afterwards, floating point operations on y[i] remain fast because they aren't denormalized.

Why is the extra precision lost when you add 0.1f? Because floating point numbers only have so many significant digits. Say you have enough storage for three significant digits, then 0.00001 = 1e-5, and 0.00001 + 0.1 = 0.1, at least for this example float format, because it doesn't have room to store the least significant bit in 0.10001.