11
votes

In my quest to understand and harness GHC automatic memoization, I've hit a wall: when pure functions are called with fixed values like fib 42, they are sometimes fast and sometimes slow when called again. It varies if they're called plainly like fib 42 or implicitly through some math, e.g. (\x -> fib (x - 1)) 43. The cases have no seeming rhyme or reason, so I'll present them with the intention of asking what the logic is behind the behavior.

Consider a slow Fibonacci implementation, which makes it obvious when the memoization is working:

slow_fib :: Int -> Integer
slow_fib n = if n < 2 then 1 else (slow_fib (n - 1)) + (slow_fib (n - 2))

I tested three basic questions to see if GHC (version 8.2.2) will memoize calls with fixed args:

  1. Can slow_fib access previous top-level calls to slow_fib?
  2. Are previous results memoized for later non-trivial (e.g. math) top-level expressions?
  3. Are previous results memoized for later identical top-level expressions?

The answers seem to be:

  1. No
  2. No
  3. Yes [??]

The fact that the last case works is very confusing to me: if I can reprint the result for example, then I should expect to be able to add them. Here's the code that shows this:

main = do
  -- 1. all three of these are slow, even though `slow_fib 37` is 
  -- just the sum of the other two results. Definitely no memoization.
  putStrLn $ show $ slow_fib 35
  putStrLn $ show $ slow_fib 36
  putStrLn $ show $ slow_fib 37

  -- 2. also slow, definitely no memoization as well.
  putStrLn $ show $ (slow_fib 35) + (slow_fib 36) + (slow_fib 37)
  putStrLn $ show $ (slow_fib 35) + 1

  -- 3. all three of these are instant. Huh?
  putStrLn $ show $ slow_fib 35
  putStrLn $ show $ slow_fib 36
  putStrLn $ show $ slow_fib 37

Yet stranger, doing math on the results worked when it's embedded in a recursive function: this fibonacci variant that starts at Fib(40):

let fib_plus_40 n = if n <= 0
  then slow_fib 40
  else (fib_plus_40 (n - 1)) + (fib_plus_40 (n - 2))

Shown by the following:

main = do
  -- slow as expected
  putStrLn $ show $ fib_plus_40 0

  -- instant. Why?!
  putStrLn $ show $ fib_plus_40 1

I can't find any reasoning for this in any explanations for GHC memoization, which typically incriminate explicit variables (e.g. here, here, and here). This is why I expected fib_plus_40 to fail to memoize.

2
Good question. Timing is tricky though -- You should learn to read core for these kinds of experiments. Compile with -ddump-simpl, if I recallluqui
I expect that in your first example, it is actually not the case that the compiler has pulled out slow_fib 35 as an expression to share between two calls to putStrLn; but rather that putStrLn $ show $ slow_fib 35 itself has been pulled out for sharing. This explains why putStrLn $ show $ slow_fib35 + 1 is still slow: it is a separate (perhaps memoized!) action that still must do computation. Indeed, you can verify that if you put a second putStrLn $ show $ slow_fib35 + 1, it too executes instantly, so this is not a "trivial vs. nontrivial" distinction as you propose.Daniel Wagner
@DanielWagner Ah yes, that's just as surprising. It looks like it is just plain unpredictable.concat

2 Answers

5
votes

All of the examples you link at the end exploit the same technique: instead of implementing function f directly, they first introduce a list whose contents are all the calls to f that could ever be made. That list is computed only once, lazily; and then a simple lookup in that list is used as the implementation of the user-facing function. So, they are not relying on any caching from GHC.

Your question is different: you hope that calling some function will be automatically cached for you, and in general that does not happen. The real question is why any of your results are fast. I'm not sure, but I think it is to do with Constant Applicative Forms (CAFs), which GHC may share between multiple use sites, at its discretion.

The most relevant feature of a CAF here is the "Constant" part: GHC will only introduce such a cache for an expression whose value is constant throughout the entire run of the program, not just for some particular scope. So, you can be sure that f x <> f x will never reuse the result of f x (at least not due to CAF folding; maybe GHC can find some other excuse to memoize this for some functions, but typically it does not).

The two things in your program that are not CAFs are the implementation of slow_fib, and the recursive case of fib_plus_40. GHC definitely cannot introduce any caching of the results of those expressions. The base case for fib_plus_40 is a CAF, as are all of the expressions and subexpressions in main. So, GHC can choose to cache/share any of those subexpressions, and not share any of them, as it pleases. Perhaps it sees that slow_fib 40 is "obviously" simple enough to save, but it's not so sure about whether the slow_fib 35 expressions in main should be shared. Meanwhile, it sounds like it does decide to share the IO action putStrLn $ show $ slow_fib 35 for whatever reason. Seems like a weird choice to you and me, but we're not compilers.

The moral here is that you cannot count on this at all: if you want to ensure you compute a value only once, you need to save it in a variable somewhere, and refer to that variable instead of recomputing it.


To confirm this, I took luqui's advice and looked at the -ddump-simpl output. Here are some snippets showing the explicit caching:

-- RHS size: {terms: 2, types: 0, coercions: 0}
lvl1_r4ER :: Integer
[GblId, Str=DmdType]
lvl1_r4ER = $wslow_fib_r4EP 40#

Rec {
-- RHS size: {terms: 21, types: 4, coercions: 0}
Main.main_fib_plus_40 [Occ=LoopBreaker] :: Integer -> Integer
[GblId, Arity=1, Str=DmdType <S,U>]
Main.main_fib_plus_40 =
  \ (n_a1DF :: Integer) ->
    case integer-gmp-1.0.0.1:GHC.Integer.Type.leInteger#
           n_a1DF Main.main7
    of wild_a2aQ { __DEFAULT ->
    case GHC.Prim.tagToEnum# @ Bool wild_a2aQ of _ [Occ=Dead] {
      False ->
        integer-gmp-1.0.0.1:GHC.Integer.Type.plusInteger
          (Main.main_fib_plus_40
             (integer-gmp-1.0.0.1:GHC.Integer.Type.minusInteger
                n_a1DF Main.main4))
          (Main.main_fib_plus_40
             (integer-gmp-1.0.0.1:GHC.Integer.Type.minusInteger
                n_a1DF lvl_r4EQ));
      True -> lvl1_r4ER
    }
    }
end Rec }

This doesn't tell us why GHC is choosing to introduce this cache - remember, it's allowed to do what it wants. But it does confirm the mechanism, that it introduces a variable to hold the repeated calculation. I can't show you core for your longer main involving smaller numbers, because when I compile it I get more sharing: the expressions in section 2 are cached for me as well.

5
votes

To elaborate in case it wasn't clear from @amalloy's answer, the problem is that you're conflating two things here -- the implicit memoization-like-behavior (what people mean when they talk about Haskell's "automatic memoization", though it is not true memoization!) that results directly from thunk-based lazy evaluation, and a compiler optimization technique that's basically a form of common subexpression elimination. The former is predictable, more or less; the latter is at the whim of the compiler.

Recall that real memoization is a property of the implementation of a function: the function "remembers" results calculated for certain combinations of arguments, and may reuse those results instead of recalculating them from scratch when called multiple times with the same arguments. When GHC generates code for functions, it does not automatically generate code to perform this kind of memoization.

Instead, the GHC code generates to implement function application is unusual. Instead of actually applying the function to arguments to generate the final result as a value, a "result" is immediately constructed in the form of a thunk, which you can view as a suspended function call or a "promise" to deliver a value at a later time.

When, at some future point, the actual value is needed, the thunk is forced (which actually causes the original function call to take place), and the thunk is updated with the value. If that same value is needed again later, the value is already available, so the thunk doesn't need to be forced a second time. This is the "automatic memoization". Note that it takes place at the "result" level rather than the "function" level -- the result of a function application remembers its value; a function does not remember the results it previously produced.

Now, normally the concept of the result of a function application remembering its value would be ridiculous. In strict languages, we don't worry that after x = sqrt(10), reusing x will cause multiple sqrt calls because x hasn't "memoized" its own value. That is, in strict languages, all function application results are "automatically memoized" in the same sense they are in Haskell.

The difference is lazy evaluation, which allows us to write something like:

stuff = map expensiveComputation [1..10000]

which returns a thunk immediately without performing any expensive computations. Afterwards:

f n = stuff !! n

magically creates a memoized function, not because GHC generates code in the implementation of f to somehow memoize the call f 1000, but because f 1000 forces (a bunch of list constructor thunks and then) a single expensiveComputation whose return value is "memoized" as the value at index 1000 in the list stuff -- it was a thunk, but after being forced, it remembers its own value, just like any value in a strict language would.

So, given your definition of slow_fib, none of your examples are actually making use of Haskell's automatic memoization, in the usual sense people mean. Any speedups you're seeing are the result of various compiler optimizations that are (or aren't) recognizing common subexpressions or inlining / unwrapping short loops.

To write a memoized fib, you need to do it as explicitly as you would in a strict language, by creating a data structure to hold the memoized values, though lazy evaluation and mutually recursive definitions can sometimes make it seem like it's "automatic":

import qualified Data.Vector as V
import Data.Vector (Vector,(!))

fibv :: Vector Integer
fibv = V.generate 1000000 getfib
  where getfib 0 = 1
        getfib 1 = 1
        getfib i = fibv ! (i-1) + fibv ! (i-2)

fib :: Int -> Integer
fib n = fibv ! n