10
votes

I was wondering whether, under specific conditions, it is possible to remove floating point errors without resorting to arbitrary-precision datatypes.

The problem is the usual one. The language is Ruby, but it holds in any language:

f = 1829.82
=> 1829.82

f / 12.0
=> 152.485

(f / 12.0).round(2)
=> 152.48

Why not 152.49? Because due to the finite precision of floating points:

format("%18.14f", f)
=> "1829.81999999999994"

format("%18.14f", f / 12.0)
=> "152.48499999999999"

So the rounding is correct. Now my question is: is there a way to get the answer I want anyway, given the following circumstances: there are strong limits on the (number of) operations performed using float, the precision needed is limited to two decimal places (max 8 digits in total) and a small amount of remaining 'wrongly' rounded answers is acceptable?

The situation is such that users may enter valid Ruby strings like:

"foo / 12.0"

where foo is a number provided in the context in which the string is executed, but where '12.0' is something the user enters. Imagine a spreadsheet with some free formula fields. The strings are simply evaluated as Ruby, so 12.0 becomes a Float. I could use the ruby_parser + ruby2ruby gems to build a parse tree, mangle the datatype to Bignum, Rational, something from the Flt library, decimal floating point representations or what-have-you, but that is tricky, as the actual strings can become somewhat more complex, so I prefer not to go this way. I will go that way if nothing else is possible, but this question is specifically here to see if I can avoid that path. As such, the datatype of the 12.0 is strictly Float and the outcome is strictly Float and the only thing I can do is interpret the final answer of the snippet and attempt to 'correct' it, if it rounds the 'wrong' way.

The only calculations the users do involve numbers with a precision of two decimal digits (and at most 8 digits in total). With 'simple' I mean that the floating point errors do not get a chance to accumulate: I may add two of these numbers and divide one by an integer, but then the calculation is done, the result is rounded and stored and any subsequent calculation is based on that rounded number. Usually only one floating point error will be involved, but I think the problem does not significantly alter if two can accumulate, though the residual error rate may be larger by definition.

What may first come to mind is first rounding to 3 decimal digits, then to 2. However, that doesn't work. That would lead to

152.48499999999999 => 152.485 => 152.49

but also

152.4846 => 152.485 => 152.49

which is not what you want.

What next came to my mind is adding the smallest possible increment (which, as people have pointed out, depends on the floating point value under consideration) to a float if that nudges it over the .5 border. I'm mainly wondering how often that could result in a 'false positive': a number to which the smallest increment is added, even though the fact that it was just below the .5 border was not due to a floating point error, but because it was simply the result of the calculation?

A second option is: just always add the smallest increment to numbers, as the .5 region is the only one where it matters anyway.

Edit: I just rewrote the question to incorporate part of my answers in comments, as cdiggins suggested. I awarded the bounty to Ira Baxter for his active participation in the discussion, though I'm not yet convinced he is right: Mark Ransom and Emilio M Bumachar seem to support my idea that a correction is possible that will, in practice, in possibly a relatively large majority of cases, produce the 'correct' result.

I still have to perform the experiment to see how often the result would be correct and I fully intend to, but the time I can spend on this is somewhat limited, so I haven't gotten round to it yet. The experiment is not trivial.

8
Most computer languages stop at about 16 significant digits because they, and the underlying hardware, are based on IEEE 754. Spreadsheets use "decimal" arithmetic with more precision than that.Rick James

8 Answers

6
votes

It sounds like what you want are fixed-precision decimal numbers. A good library implementing these is going to be more reliable than hacking something together yourself.

For Ruby, check out the Flt library.

5
votes

"it is possible to remove floating point errors without resorting to infinite precision datatypes."?

No. Floating point errors are your computer's only mistakes involving number crunching. If you remove all errors, by definition your precision is infinite.

That sounded pedantic, which was not my intention. I'm trying to point out that there is a big conceptual problem underneath your apparently technical issue. You can't correctly round incorrect numbers based on what their correct values would be, unless you know their correct values (i.e. infinite precision values, or other form of carrying that information).

Your idea of adding a small number may work out statistically, but only statistically. I suggest you write a script to test a massive number of cases with random numbers of no more than two decimal places. (The script would need to also do the math with infinite precision, in order to know the right answer to compare) That way, you can measure corrections and false positives.

3
votes

If you are can control the amount of arithmetic (especially multiplies and divides), you can try simply scaling all your floating point values by some power scale of ten (say scale=4). You'll have to change the code do input, output, and multiplies and divides.

Then scale=2 decimal fractions such as 5.10 are stored exactly as 510. Inputs need to be entered accurately; e.g., read in the string mmm.nnnn, move the decimal place scale locations in the string (e.g., for scale=2 ==> mmmnn.nn and then convert the string to float). Addition/Subtraction of such fractional numbers is exact and doesn't need any code changes. Multiplication and division loses some "decimal" precision and needs to be scaled; code that said x*y needs to be changed to x*y/scale; x/y needs to be changed to x*scale/y. You'll can round the string at the scale point and then output it.

This answer is the cheesy version of using a real decimal arithmetic package, mentioned by another poster.

3
votes

The smallest increment that you mention is typically called epsilon. This is the smallest value that can be added to 1.0 to make a noticeable change. If you want to add it to other numbers you must scale it first: x = x + (x * epsilon).

There's another definition of epsilon which is the largest rounding error of a floating point number. This definition should be half of the first one.

In theory adding an epsilon value before rounding will produce as many errors as it corrects. In practice this won't be the case, since numbers that are close to an even decimal number are much more likely to be encountered than random chance would suggest.

2
votes

I notice that in a comment to one of the answers it was argued that changing datatype was hard. Nonetheless I'm going to answer the question as asked:

I was wondering whether, under specific conditions, it is possible to remove floating point errors without resorting to infinite precision datatypes.

In order to achieve exact results you will need to use decimal floating point representations of the numbers and the appropriate math routines. Note that fixed-point math libraries can still result in binary floating point errors, if they use binary representations of the numbers.

1
votes

In the general case, I would say that it's impossible to get the right answer all the time. As you discovered yourself, rounding twice is not the answer. Instead, try to keep the highest precision as long as possible.

However, you do have a full arsenal of function to your disposal. You can round up, round down, round to zero, round to infinity, so if you know what your algorithm is doing, you can use the appropriate function.

I would say that adding a "small" value, or "epsilon" as it's commonly called, is a feasible way. Just keep in mind that if the original value is negative, you would have to substract it rather than adding it. Also, note that if you are dealing with the full range of floating-point values, epsilon might depend on the value.

0
votes

No, you cannot prevent accumulation of floating-point errors because the machine arithmetic always rounds operation results to fit into the given number of bits. In addition to that, take into account that the results of many operations require an infinite number of bits to be represented exactly (e.g. 2/10=0.2 ; but it requires an infinite number of bits to represent exactly in base 2, which is what computers work with).

0
votes

this unfortunately isnt your answer, but it might get you started.

Object:

class Object
  # Return only the methods not present on basic objects
  def local_methods
    (self.methods - Object.new.methods).sort
  end
end

callbacks Module:

module Hooker
  module ClassMethods
  private
    def following(*syms, &block)
      syms.each do |sym| # For each symbol
        str_id = "__#{sym}__hooked__"
        unless private_instance_methods.include?(str_id)
          alias_method str_id, sym    # Backup original method
          private str_id         # Make backup private
          define_method sym do |*args|  # Replace method
            ret = __send__ str_id, *args # Invoke backup
            rval=block.call(self,       # Invoke hook
             :method => sym, 
             :args => args,
             :return => ret
            )
            if not rval.nil?
              ret=rval[:ret]
            end
            ret # Forward return value of method
          end
        end
      end
    end
  end

  def Hooker.included(base)
    base.extend(ClassMethods)
  end
end

And changes to Float to actually do the work:

if 0.1**2 != 0.01 # patch Float so it works by default
  class Float
    include Hooker
    0.1.local_methods.each do |op|
      if op != :round
        following op do |receiver, args|
          if args[:return].is_a? Float
            ret=args[:return].round Float::DIG
            ret=Hash[:ret => ret]
          end
          ret
        end
      end
    end
  end
end

Edit: somewhat better is using Rational. The nmethods overriding still isnt always on (see problems, after the code):

  class Float
    include Hooker
    0.1.local_methods.each do |op|
      if op != :round
        following op do |receiver, args|
          if args[:return].is_a? Float
            argsin=[]
            args[:args].each do |c|
              argsin=c.rationalize
            end
            rval=receiver.rationalize.send(
                args[:method], 
                argsin
               )
            ret=Hash[:ret => rval.to_f]
          end
          ret
        end
      end
    end
  end

Problems: Not all method overrides are working, at least in 1.9.3p0:

pry(main)> 6543.21 % 137.24
=> 92.93
[... but ...]
pry(main)> 19.5.send(:-.to_sym, 16.8)
=> 2.7
pry(main)> 19.5 - 16.8
=> 2.6999999999999993