47
votes

I've been reading a lot about floating-point determinism in .NET, i.e. ensuring that the same code with the same inputs will give the same results across different machines. Since .NET lacks options like Java's fpstrict and MSVC's fp:strict, the consensus seems to be that there is no way around this issue using pure managed code. The C# game AI Wars has settled on using Fixed-point math instead, but this is a cumbersome solution.

The main issue appears to be that the CLR allows intermediate results to live in FPU registers that have higher precision than the type's native precision, leading to impredictably higher precision results. An MSDN article by CLR engineer David Notario explains the following:

Note that with current spec, it’s still a language choice to give ‘predictability’. The language may insert conv.r4 or conv.r8 instructions after every FP operation to get a ‘predictable’ behavior. Obviously, this is really expensive, and different languages have different compromises. C#, for example, does nothing, if you want narrowing, you will have to insert (float) and (double) casts by hand.

This suggests that one may achieve floating-point determinism simply by inserting explicit casts for every expression and sub-expression that evaluates to float. One might write a wrapper type around float to automate this task. This would be a simple and ideal solution!

Other comments however suggest that it isn't so simple. Eric Lippert recently stated (emphasis mine):

in some version of the runtime, casting to float explicitly gives a different result than not doing so. When you explicitly cast to float, the C# compiler gives a hint to the runtime to say "take this thing out of extra high precision mode if you happen to be using this optimization".

Just what is this "hint" to the runtime? Does the C# spec stipulate that an explicit cast to float causes the insertion of a conv.r4 in the IL? Does the CLR spec stipulate that a conv.r4 instruction causes a value to be narrowed down to its native size? Only if both of these are true can we rely on explicit casts to provide floating point "predictability" as explained by David Notario.

Finally, even if we can indeed coerce all intermediate results to the type's native size, is this enough to guarantee reproducibility across machines, or are there other factors like FPU/SSE run-time settings?

2

2 Answers

28
votes

Just what is this "hint" to the runtime?

As you conjecture, the compiler tracks whether a conversion to double or float was actually present in the source code, and if it was, it always inserts the appropriate conv opcode.

Does the C# spec stipulate that an explicit cast to float causes the insertion of a conv.r4 in the IL?

No, but I assure you that there are unit tests in the compiler test cases that ensure that it does. Though the specification does not demand it, you can rely on this behaviour.

The specification's only comment is that any floating point operation may be done in a higher precision than required at the whim of the runtime, and that this can make your results unexpectedly more accurate. See section 4.1.6.

Does the CLR spec stipulate that a conv.r4 instruction causes a value to be narrowed down to its native size?

Yes, in Partition I, section 12.1.3, which I note you could have looked up yourself rather than asking the internet to do it for you. These specifications are free on the web.

A question you didn't ask but probably should have:

Is there any operation other than casting that truncates floats out of high precision mode?

Yes. Assigning to a static field, instance field or element of a double[] or float[] array truncates.

Is consistent truncation enough to guarantee reproducibility across machines?

No. I encourage you to read section 12.1.3, which has much interesting to say on the subject of denormals and NaNs.

And finally, another question you did not ask but probably should have:

How can I guarantee reproducible arithmetic?

Use integers.

25
votes

The 8087 Floating Point Unit chip design was Intel's billion dollar mistake. The idea looks good on paper, give it an 8 register stack that stores values in extended precision, 80 bits. So that you can write calculations whose intermediate values are less likely to lose significant digits.

The beast is however impossible to optimize for. Storing a value from the FPU stack back to memory is expensive. So keeping them inside the FPU is a strong optimization goal. Inevitable, having only 8 registers is going to require a write-back if the calculation is deep enough. It is also implemented as a stack, not freely addressable registers so that requires gymnastics as well that may produce a write-back. Inevitably a write back will truncate the value from 80-bits back to 64-bits, losing precision.

So consequences are that non-optimized code does not produce the same result as optimized code. And small changes to the calculation can have big effects on the result when an intermediate value ends up needing to be written back. The /fp:strict option is a hack around that, it forces the code generator to emit a write-back to keep the values consistent, but with the inevitable and considerable loss of perf.

This is a complete rock and a hard place. For the x86 jitter they just didn't try to address the problem.

Intel didn't make the same mistake when they designed the SSE instruction set. The XMM registers are freely addressable and don't store extra bits. If you want consistent results then compiling with the AnyCPU target, and a 64-bit operating system, is the quick solution. The x64 jitter uses SSE instead of FPU instructions for floating point math. Albeit that this added a third way that a calculation can produce a different result. If the calculation is wrong because it loses too many significant digits then it will be consistently wrong. Which is a bit of a bromide, really, but typically only as far as a programmer looks.