5
votes

What's the fastest way to memoize a recursive function in Haskell?

Background: Recently I have been solving Project Euler problems in Haskell. Many require many computations of a recursively defined combinatorial or number-theoretical function, for example the Fibonacci numbers. Performance improves dramatically if such functions are memoized, that is, the results of the function are cached for later use.

I have seen many solutions to this problem. The most elegant seems to be this. One uses Data.IntMap (or hash tables) and the State monad. A tree based solution suggested in this answer, and such solutions seem fairly common. To give another example, see this blog post. I've seen other solutions using built-in functions. There's one in section 2 here with the fix, and further it seems that the compiler can sometimes be massaged into memoizing without additional work. There's also several prebuilt solutions.

I'm wondering which memoization method is fastest in practice for the kinds of functions used in Project Euler. My intuition says the hashtables library is, since hash tables seem to be the preferred dictionary structure in imperative languages. The purely functional tree solutions are cool, but my Googling tells me they're strictly worse than hash tables in terms of asymptotic performance.

Edit

A few comments said this question is too broad to answer, and upon reflection I agree. So let me give two concrete examples of functions to memoize: a function that computes the n'th Fibonacci number recursively, and one that computes the Catalan numbers recursively. I would like to compute these functions many times for large n.

I'm aware there are explicit formulas for these, but let's ignore that, because the real point here is to use them to benchmark memoization techniques.

1
Binary search trees like IntMap are slower than hash maps in terms of asymptotic performance, yes. Asymptotics aren't everything, though. IntMap has a number of other desirable properties, such as avoiding the overhead of hashing, behaving well when used in a persistent manner, etc. For normal usages IntMap's performance is competitive with a hash table.Benjamin Hodgson
It depends on the problem. If arrays are appropriate, for example when you will eventually have to evaluate the function on every value in a range, then they are probably the most efficient choice.Reid Barton
As a side note, the last stack overflow answer you are linking to is written by Jon Harrop. Here is a comment @sclv wrote a while back that I think is relevant.Alec
You may be interested in this nice link with description how to perform lazy dynamic programming and kind of memoization you want: jelv.is/blog/Lazy-Dynamic-ProgrammingShersh
The best approach still depends on the situation: do you know up front the values you will need to memoize, are they a range of consecutive integers, what time/space tradeoffs do you want to make. The functions themselves aren't really the important thing.Reid Barton

1 Answers

1
votes

When Trying to find the nth fibonacci number the only numbers that you need to memoize are the two previous numbers. you can do it as a tuple like (f n-1, f n ) and on each loop update this tuple. note that updating the tuple is done through pointer manipulation and is not computationally expensive.

a cleaner and slightly smarter alternative would be:

fibs :: [Integer]
fibs = fibcreator 0 1
  where
    fibcreator a b = a : fibcreator b (a+b)

nth = take n fibs

But one of the best algorithms that I have seen is this:

  1. let's define a matrix m = [e11 = 1,e12 =1,e21 = 1,e22 = 0]
  2. to get nth fibonacci number we compute m' = m ^ (n-1)
  3. e11 element in the matrix m' is the nth fibonacci number

Now what is great here is that in order to get 17 fibonacci number we can do

m' = ((((m^2)^2)^2)^2) * m

This significantly reduces the computaion time and passively embeds the memoization within the algorithm. The punchline is that Haskell already uses this algorithm to compute the power function, so you don't need to implement it. the full implementation is :

data Matrix = Matrix Integer Integer Integer Integer

instance Num Matrix where
  (*) (Matrix a11 a12 a21 a22) (Matrix b11 b12 b21 b22)
   = Matrix (a11*b11 + a12*b21) (a11*b12 + a12*b22) (a21*b11 + a22*b21) (a21*b12 + a22*b22)

fib4 :: Integer -> Integer
fib4 0 = 0
fib4 n = x
  where
    (Matrix x _ _ _) = Matrix 1 1 1 0 ^ (n-1)