2
votes

I am new to Haskell and have been practicing by doing some simple programming challenges. The last 2 days, I've been trying to implement the unbounded knapsack problem here. The algorithm I'm using is described on the wikipedia page, though for this problem the word 'weight' is replaced with the word 'length'. Anyways, I started by writing the code without memoization:

maxValue :: [(Int,Int)] -> Int -> Int
maxValue [] len = 0
maxValue ((l, val): other) len =
    if l > len then 
        skipValue
    else 
        max skipValue takeValue
    where skipValue = maxValue other len
          takeValue = (val + maxValue ([(l, val)] ++ other) (len - l)

I had hoped that haskell would be nice and have some nice syntax like #pragma memoize to help me, but looking around for examples, the solution was explained with this fibonacci problem code.

memoized_fib :: Int -> Integer
memoized_fib = (map fib [0 ..] !!)
   where fib 0 = 0
         fib 1 = 1
         fib n = memoized_fib (n-2) + memoized_fib (n-1)

After grasping the concept behind this example, I was very disappointed - the method used is super hacky and only works if 1) the input to the function is a single integer, and 2) the function needs to compute the values recursively in the order f(0), f(1), f(2), ... But what if my parameters are vectors or sets? And if I want to memoize a function like f(n) = f(n/2) + f(n/3), I need to compute the value of f(i) for all i less than n, when I don't need most of those values. (Others have pointed out this claim is false)

I tried implementing what I wanted by passing a memo table that we slowly fill up as an extra parameter:

maxValue :: (Map.Map (Int, Int) Int) -> [(Int,Int)] -> Int -> (Map.Map (Int, Int) Int, Int)
maxValue m [] len = (m, 0)
maxValue m ((l, val) : other) len =
    if l > len then
        (mapWithSkip, skipValue)
    else
        (mapUnion, max skipValue (takeValue+val))
    where (skipMap, skipValue) = maxValue m other len
          mapWithSkip = Map.insertWith' max (1 + length other, len) skipValue skipMap
          (takeMap, takeValue) = maxValue m ([(l, val)] ++ other) (len - l)
          mapWithTake = Map.insertWith' max (1 + length other, len) (takeValue+val) mapWithSkip
          mapUnion = Map.union mapWithSkip mapWithTake

But this is too slow, I believe because Map.union takes too long, it's O(n+m) rather than O(min(n,m)). Furthermore, this code seems a quite messy for something as simple as memoizaton. For this specific problem, you might be able to get away with generalizing the hacky approach to 2 dimensions, and computing a bit extra, but I want to know how to do memoization in a more general sense. How can I implement memoization in this more general form while maintaining the same complexity as the code would have in imperative languages?

3
At a high level, your objections are completely reasonable (and the answer below addresses them well, I think). I want to object to just one low-level detail: if you memoized f(n) = f(n/2) + f(n/3) in the analogous way to the memoized_fib you showed, laziness would mean that you don't need to compute the value of f(i) for all i less than n, as you claim. - Daniel Wagner

3 Answers

5
votes

And if I want to memoize a function like f(n) = f(n/2) + f(n/3), I need to compute the value of f(i) for all i less than n, when I don't need most of those values.

No, laziness means that values that are not used never get computed. You allocate a thunk for them in case they are ever used, so it's a nonzero amount of CPU and RAM dedicated to this unused value, but e.g. evaluating f 6 never causes f 5 to be evaluated. So presuming that the expense of calculating an item is much higher than the expense of allocating a cons cell, and that you end up looking at a large percentage of the total possible values, the wasted work this method uses is small.

But what if my parameters are vectors or sets?

Use the same technique, but with a different data structure than a list. A map is the most general approach, provided that your keys are Ord and also that you can enumerate all the keys you will ever need to look up.

If you can't enumerate all the keys, or you plan to look up many fewer keys than the total number possible, then you can use State (or ST) to simulate the imperative process of sharing a writable memoization cache between invocations of your function.

I would have liked to show you how this works, but I find your problem statement / links confusing. The exercise you link to does seem to be equivalent to the UKP in the Wikipedia article you link to, but I don't see anything in that article that looks like your implementation. The "Dynamic programming in-advance algorithm" Wikipedia gives is explicitly designed to have the exact same properties as the fib memoization example you gave. The key is a single Int, and the array is built from left to right: starting with len=0 as the base case, and basing all other computations on already-computed values. It also, for some reason I don't understand, seems to assume you will have at least 1 copy of each legal-sized object, rather than at least 0; but that is easily fixed if you have different constraints.

What you've implemented is totally different, starting from the total len, and choosing for each (length, value) step how many pieces of size length to cut up, then recursing with a smaller len and removing the front item from your list of weight-values. It's closer to the traditional "how many ways can you make change for an amount of currency given these denominations" problem. That, too, is amenable to the same left-to-right memoization approach as fib, but in two dimensions (one dimension for amount of currency to make change for, and another for number of denominations remaining to be used).

3
votes

My go-to way to do memoization in Haskell is usually MemoTrie. It's pretty straightforward, it's pure, and it usually does what I'm looking for.

Without thinking too hard, you could produce:

import Data.MemoTrie (memo2)
maxValue :: [(Int,Int)] -> Int -> Int
maxValue = memo2 go
  where
    go [] len = 0
    go lst@((l, val):other) len =
      if l > len then skipValue else max skipValue takeValue
      where
        skipValue = maxValue other len
        takeValue = val + maxValue lst (len - l)

I don't have your inputs, so I don't know how fast this will go — it's a little strange to memoize the [(Int,Int)] input. I think you recognize this too because in your own attempt, you actually memoize over the length of the list, not the list itself. If you want to do that, it makes sense to convert your list to a constant-time-lookup array and then memoize. This is what I came up with:

import qualified GHC.Arr as Arr

maxValue :: [(Int,Int)] -> Int -> Int
maxValue lst = memo2 go 0
  where
    values = Arr.listArray (0, length lst - 1) lst
    go i _ | i >= length lst = 0
    go i len = if l > len then skipValue else max skipValue takeValue
      where
        (l, val) = values Arr.! i
        skipValue = go (i+1) len
        takeValue = val + go i (len - l)
1
votes

General, run-of-the-mill memoization in Haskell can be implemented the same way it is in other languages, by closing a memoized version of the function over a mutable map that caches the values. If you want the convenience of running the function as if it was pure, you'll need to maintain the state in IO and use unsafePerformIO.

The following memoizer will probably be sufficient for most code submission websites, as it depends only on System.IO.Unsafe, Data.IORef, and Data.Map.Strict, which should usually be available.

import qualified Data.Map.Strict as Map
import System.IO.Unsafe
import Data.IORef

memo :: (Ord k) => (k -> v) -> (k -> v)
memo f = unsafePerformIO $ do
  m <- newIORef Map.empty
  return $ \k -> unsafePerformIO $ do
    mv <- Map.lookup k <$> readIORef m
    case mv of
      Just v -> return v
      Nothing -> do
        let v = f k
        v `seq` modifyIORef' m $ Map.insert k v
        return v

From your question and comments, you seem to be the sort of person who's perpetually disappointed (!), so perhaps the use of unsafePerformIO will disappoint you, but if GHC actually provided a memoization pragma, this is probably what it would be doing under the hood.

For an example of straightforward use:

fib :: Int -> Int
fib = memo fib'
  where fib' 0 = 0
        fib' 1 = 1
        fib' n = fib (n-1) + fib (n-2)

main = do
  print $ fib 100000

or more to the point (SPOILERS?!), a version of your maxValue memoized in the length only:

maxValue :: [(Int,Int)] -> Int -> Int
maxValue values = go
  where go = memo (go' values)
        go' [] len = 0
        go' ((l, val): other) len =
          if l > len then
              skipValue
          else
              max skipValue takeValue
          where skipValue = go' other len
                takeValue = val + go (len - l)

This does a little more work than necessary, since the takeValue case re-evaluates the full set of marketable pieces, but it was fast enough to pass all the test cases on the linked web page. If it wasn't fast enough, then you'd need a memoizer that memoizes a function with results shared across calls with non-identical arguments (same length, but different marketable pieces, where you know the answer is going to be the same anyway because of special aspects of the problem and the order in which you check different marketable pieces and lengths). This would be a non-standard memoization, but it wouldn't be hard to modify the memo function to handle this case, I don't think, simply by splitting the argument up into a "key" argument and a "non-key" argument, or deriving the key from the argument via an arbitrary function supplied at memoization time.