1
votes

I'm sorry this problem description is so abstract: its for my job, and for commercial confidentiality reasons I can't give the real-world problem, just an abstraction.

I've got an application that receives messages containing key-value pairs. The keys are from a defined set of keywords, and each keyword has a fixed data type. So if "Foo" is an Integer and "Bar" is a date you might get a message like:

Foo: 234
Bar: 24 September 2011

A message may have any subset of keys in it. The number of keys is fairly large (several dozen). But lets stick with Foo and Bar for now.

Obviously there is a record like this corresponding to the messages:

data MyRecord {
   foo :: Maybe Integer
   bar :: Maybe UTCTime
   -- ... and so on for several dozen fields.
}

The record uses "Maybe" types because that field may not have been received yet.

I also have many derived values that I need to compute from the current values (if they exist). For instance I want to have

baz :: MyRecord -> Maybe String
baz r = do -- Maybe monad
   f <- foo r
   b <- bar r
   return $ show f ++ " " ++ show b

Some of these functions are slow, so I don't want to repeat them unnecessarily. I could recompute baz for each new message and memo it in the original structure, but if a message leaves the foo and bar fields unchanged then that is wasted CPU time. Conversely I could recompute baz every time I want it, but again that would waste CPU time if the underlying arguments have not changed since last time.

What I want is some kind of smart memoisation or push-based recomputation that only recomputes baz when the arguments change. I could detect this manually by noting that baz depends only on foo and bar, and so only recomputing it on messages that change those values, but for complicated functions that is error-prone.

An added wrinkle is that some of these functions may have multiple strategies. For instance you might have a value that can be computed from either Foo or Bar using 'mplus'.

Does anyone know of an existing solution to this? If not, how should I go about it?

3
An obvious point - can you live with the derived data in a Map indexed by the "known" intrinsic data rather than embed both types of data inside a record? It would be much easier to think about when and how to recalculate if you can manage your data like this.stephen tetley
That would force all derived data to be the same type, which would probably mean smashing it into a string. Its a fallback position, but not one that I want. OTOH maybe there is some solution involving "forall" that can solve this.Paul Johnson

3 Answers

2
votes

I'll assume that you have one "state" record and these message all involve updating it as well as setting it. So if Foo is 12, it may later be 23, and therefore the output of baz would change. If any of this is not the case, then the answer becomes pretty trivial.

Let's start with the "core" of baz -- a function not on a record, but the values you want.

baz :: Int -> Int -> String

Now let's transform it:

data Cached a b = Cached (Maybe (a,b)) (a -> b)
getCached :: Eq a => Cached a b -> a -> (b,Cached a b)
getCached c@(Cached (Just (arg,res)) f) x | x == arg = (res,c)
getCached (Cached _ f) x = let ans = f x in (ans,Cached (Just (x,ans) f)

bazC :: Cached (Int,Int) String
bazC = Cached Nothing (uncurry baz)

Now whenever you would use a normal function, you use a cache-transformed function instead, substituting the resulting cache-transformed function back into your record. This is essentially a manual memotable of size one.

For the basic case you describe, this should be fine.

A fancier and more generalized solution involving a dynamic graph of dependencies goes under the name "incremental computation" but I've seen research papers for it more than serious production implementations. You can take a look at these for starters, and follow the reference trail forward:

  1. http://www.carlssonia.org/ogi/Adaptive/
  2. http://www.andres-loeh.de/Incrementalization/paper_final.pdf

Incremental computation is actually also very related to functional reactive programming, so you can take a look at conal's papers on that, or play with Heinrich Apfelmus' reactive-banana library: http://www.haskell.org/haskellwiki/Reactive-banana

In imperative languages, take a look at trellis in python: http://pypi.python.org/pypi/Trellis or Cells in lisp: http://common-lisp.net/project/cells/

1
votes

You can build a stateful graph that corresponds to computations you need to do. When new values appear you push these into the graph and recompute, updating the graph until you reach the outputs. (Or you can store the value at the input and recompute on demand.) This is a very stateful solution but it works.

Are you perhaps creating market data, like yield curves, from live inputs of rates etc.?

1
votes

What I want is some kind of smart memoisation or push-based recomputation that only recomputes baz when the arguments change.

It sounds to me like you want a variable that is sort of immutable, but allows a one-time mutation from "nothing computed yet" to "computed". Well, you're in luck: this is exactly what lazy evaluation gives you! So my proposed solution is quite simple: just extend your record with fields for each of the things you want to compute. Here's an example of such a thing, where the CPU-intensive task we're doing is breaking some encryption scheme:

data Foo = Foo
    { ciphertext :: String
    , plaintext :: String
    }

-- a smart constructor for Foo's
foo c = Foo { ciphertext = c, plaintext = crack c }

The point here is that calls to foo have expenses like this:

  1. If you never ask for the plaintext of the result, it's cheap.
  2. On the first call to plaintext, the CPU churns a long time.
  3. On subsequent calls to plaintext, the previously computed answer is returned immediately.