Data value dependencies, updates and memoisation

Question

I'm sorry this problem description is so abstract: its for my job, and for commercial confidentiality reasons I can't give the real-world problem, just an abstraction.

I've got an application that receives messages containing key-value pairs. The keys are from a defined set of keywords, and each keyword has a fixed data type. So if "Foo" is an Integer and "Bar" is a date you might get a message like:

Foo: 234
Bar: 24 September 2011

A message may have any subset of keys in it. The number of keys is fairly large (several dozen). But lets stick with Foo and Bar for now.

Obviously there is a record like this corresponding to the messages:

data MyRecord {
   foo :: Maybe Integer
   bar :: Maybe UTCTime
   -- ... and so on for several dozen fields.
}

The record uses "Maybe" types because that field may not have been received yet.

I also have many derived values that I need to compute from the current values (if they exist). For instance I want to have

baz :: MyRecord -> Maybe String
baz r = do -- Maybe monad
   f <- foo r
   b <- bar r
   return $ show f ++ " " ++ show b

Some of these functions are slow, so I don't want to repeat them unnecessarily. I could recompute baz for each new message and memo it in the original structure, but if a message leaves the foo and bar fields unchanged then that is wasted CPU time. Conversely I could recompute baz every time I want it, but again that would waste CPU time if the underlying arguments have not changed since last time.

What I want is some kind of smart memoisation or push-based recomputation that only recomputes baz when the arguments change. I could detect this manually by noting that baz depends only on foo and bar, and so only recomputing it on messages that change those values, but for complicated functions that is error-prone.

An added wrinkle is that some of these functions may have multiple strategies. For instance you might have a value that can be computed from either Foo or Bar using 'mplus'.

Does anyone know of an existing solution to this? If not, how should I go about it?

An obvious point - can you live with the derived data in a Map indexed by the "known" intrinsic data rather than embed both types of data inside a record? It would be much easier to think about when and how to recalculate if you can manage your data like this. — stephen tetley
That would force all derived data to be the same type, which would probably mean smashing it into a string. Its a fallback position, but not one that I want. OTOH maybe there is some solution involving "forall" that can solve this. — Paul Johnson

sclv sclv · Accepted Answer · 2012-05-18T23:35:40

I'll assume that you have one "state" record and these message all involve updating it as well as setting it. So if Foo is 12, it may later be 23, and therefore the output of baz would change. If any of this is not the case, then the answer becomes pretty trivial.

Let's start with the "core" of baz -- a function not on a record, but the values you want.

baz :: Int -> Int -> String

Now let's transform it:

data Cached a b = Cached (Maybe (a,b)) (a -> b)
getCached :: Eq a => Cached a b -> a -> (b,Cached a b)
getCached c@(Cached (Just (arg,res)) f) x | x == arg = (res,c)
getCached (Cached _ f) x = let ans = f x in (ans,Cached (Just (x,ans) f)

bazC :: Cached (Int,Int) String
bazC = Cached Nothing (uncurry baz)

Now whenever you would use a normal function, you use a cache-transformed function instead, substituting the resulting cache-transformed function back into your record. This is essentially a manual memotable of size one.

For the basic case you describe, this should be fine.

A fancier and more generalized solution involving a dynamic graph of dependencies goes under the name "incremental computation" but I've seen research papers for it more than serious production implementations. You can take a look at these for starters, and follow the reference trail forward:

Incremental computation is actually also very related to functional reactive programming, so you can take a look at conal's papers on that, or play with Heinrich Apfelmus' reactive-banana library: http://www.haskell.org/haskellwiki/Reactive-banana

In imperative languages, take a look at trellis in python: http://pypi.python.org/pypi/Trellis or Cells in lisp: http://common-lisp.net/project/cells/

Data value dependencies, updates and memoisation

3 Answers