GHC: Are there consistent rules for memoization for calls with fixed values?

Question

In my quest to understand and harness GHC automatic memoization, I've hit a wall: when pure functions are called with fixed values like fib 42, they are sometimes fast and sometimes slow when called again. It varies if they're called plainly like fib 42 or implicitly through some math, e.g. (\x -> fib (x - 1)) 43. The cases have no seeming rhyme or reason, so I'll present them with the intention of asking what the logic is behind the behavior.

Consider a slow Fibonacci implementation, which makes it obvious when the memoization is working:

slow_fib :: Int -> Integer
slow_fib n = if n < 2 then 1 else (slow_fib (n - 1)) + (slow_fib (n - 2))

I tested three basic questions to see if GHC (version 8.2.2) will memoize calls with fixed args:

Can slow_fib access previous top-level calls to slow_fib?
Are previous results memoized for later non-trivial (e.g. math) top-level expressions?
Are previous results memoized for later identical top-level expressions?

The answers seem to be:

No
No
Yes [??]

The fact that the last case works is very confusing to me: if I can reprint the result for example, then I should expect to be able to add them. Here's the code that shows this:

main = do
  -- 1. all three of these are slow, even though `slow_fib 37` is 
  -- just the sum of the other two results. Definitely no memoization.
  putStrLn $ show $ slow_fib 35
  putStrLn $ show $ slow_fib 36
  putStrLn $ show $ slow_fib 37

  -- 2. also slow, definitely no memoization as well.
  putStrLn $ show $ (slow_fib 35) + (slow_fib 36) + (slow_fib 37)
  putStrLn $ show $ (slow_fib 35) + 1

  -- 3. all three of these are instant. Huh?
  putStrLn $ show $ slow_fib 35
  putStrLn $ show $ slow_fib 36
  putStrLn $ show $ slow_fib 37

Yet stranger, doing math on the results worked when it's embedded in a recursive function: this fibonacci variant that starts at Fib(40):

let fib_plus_40 n = if n <= 0
  then slow_fib 40
  else (fib_plus_40 (n - 1)) + (fib_plus_40 (n - 2))

Shown by the following:

main = do
  -- slow as expected
  putStrLn $ show $ fib_plus_40 0

  -- instant. Why?!
  putStrLn $ show $ fib_plus_40 1

I can't find any reasoning for this in any explanations for GHC memoization, which typically incriminate explicit variables (e.g. here, here, and here). This is why I expected fib_plus_40 to fail to memoize.

Good question. Timing is tricky though -- You should learn to read core for these kinds of experiments. Compile with -ddump-simpl, if I recall — luqui
I expect that in your first example, it is actually not the case that the compiler has pulled out slow_fib 35 as an expression to share between two calls to putStrLn; but rather that putStrLn $ show $ slow_fib 35 itself has been pulled out for sharing. This explains why putStrLn $ show $ slow_fib35 + 1 is still slow: it is a separate (perhaps memoized!) action that still must do computation. Indeed, you can verify that if you put a second putStrLn $ show $ slow_fib35 + 1, it too executes instantly, so this is not a "trivial vs. nontrivial" distinction as you propose. — Daniel Wagner
@DanielWagner Ah yes, that's just as surprising. It looks like it is just plain unpredictable. — concat

amalloy amalloy · Accepted Answer · 2019-07-12T06:48:18

All of the examples you link at the end exploit the same technique: instead of implementing function f directly, they first introduce a list whose contents are all the calls to f that could ever be made. That list is computed only once, lazily; and then a simple lookup in that list is used as the implementation of the user-facing function. So, they are not relying on any caching from GHC.

Your question is different: you hope that calling some function will be automatically cached for you, and in general that does not happen. The real question is why any of your results are fast. I'm not sure, but I think it is to do with Constant Applicative Forms (CAFs), which GHC may share between multiple use sites, at its discretion.

The most relevant feature of a CAF here is the "Constant" part: GHC will only introduce such a cache for an expression whose value is constant throughout the entire run of the program, not just for some particular scope. So, you can be sure that f x <> f x will never reuse the result of f x (at least not due to CAF folding; maybe GHC can find some other excuse to memoize this for some functions, but typically it does not).

The two things in your program that are not CAFs are the implementation of slow_fib, and the recursive case of fib_plus_40. GHC definitely cannot introduce any caching of the results of those expressions. The base case for fib_plus_40 is a CAF, as are all of the expressions and subexpressions in main. So, GHC can choose to cache/share any of those subexpressions, and not share any of them, as it pleases. Perhaps it sees that slow_fib 40 is "obviously" simple enough to save, but it's not so sure about whether the slow_fib 35 expressions in main should be shared. Meanwhile, it sounds like it does decide to share the IO action putStrLn $ show $ slow_fib 35 for whatever reason. Seems like a weird choice to you and me, but we're not compilers.

The moral here is that you cannot count on this at all: if you want to ensure you compute a value only once, you need to save it in a variable somewhere, and refer to that variable instead of recomputing it.

To confirm this, I took luqui's advice and looked at the -ddump-simpl output. Here are some snippets showing the explicit caching:

-- RHS size: {terms: 2, types: 0, coercions: 0}
lvl1_r4ER :: Integer
[GblId, Str=DmdType]
lvl1_r4ER = $wslow_fib_r4EP 40#

Rec {
-- RHS size: {terms: 21, types: 4, coercions: 0}
Main.main_fib_plus_40 [Occ=LoopBreaker] :: Integer -> Integer
[GblId, Arity=1, Str=DmdType <S,U>]
Main.main_fib_plus_40 =
  \ (n_a1DF :: Integer) ->
    case integer-gmp-1.0.0.1:GHC.Integer.Type.leInteger#
           n_a1DF Main.main7
    of wild_a2aQ { __DEFAULT ->
    case GHC.Prim.tagToEnum# @ Bool wild_a2aQ of _ [Occ=Dead] {
      False ->
        integer-gmp-1.0.0.1:GHC.Integer.Type.plusInteger
          (Main.main_fib_plus_40
             (integer-gmp-1.0.0.1:GHC.Integer.Type.minusInteger
                n_a1DF Main.main4))
          (Main.main_fib_plus_40
             (integer-gmp-1.0.0.1:GHC.Integer.Type.minusInteger
                n_a1DF lvl_r4EQ));
      True -> lvl1_r4ER
    }
    }
end Rec }

This doesn't tell us why GHC is choosing to introduce this cache - remember, it's allowed to do what it wants. But it does confirm the mechanism, that it introduces a variable to hold the repeated calculation. I can't show you core for your longer main involving smaller numbers, because when I compile it I get more sharing: the expressions in section 2 are cached for me as well.

GHC: Are there consistent rules for memoization for calls with fixed values?

2 Answers