Strict Maybe in data definitions

votes

I have seen many talks / read blog posts that you should have strict fields in data to avoid various performance issues, e.g.:

data Person = Person
    { personName     :: !Text
    , personBirthday :: !UTCTime
    }

This makes total sense to me. As functions operation on that data are lazy, composability isn't sacrificed.

Yet if I add a Maybe field:

data Person = Person
    { personName     :: !Text
    , personBirthday :: !UTCTime
    , personAddress  :: !(Maybe Address)
    }

I'm introducing lazyness into the data structure, after all Maybe is a control structure. Can't unevaluated thunk hide behind Just constructor?

However, there is strict Maybe in strict or thru strict-base-types. But according to the reverse dependencies (strict, strict-base-types) they aren't widely used.

So the question: Why one should or shouldn't use strict Maybe in not-control data definitions?

haskellstrictness

2 Answers

votes

Reasons to use strict Either/Maybe/Tuple types:

If you profile your code and notice a space leak, it could be the way to plug the leak
Strict data types are widely regarded as a useful thing for high-performance code, even by the latest GHC 8.0 language extensions
Other people are doing it too (those strict packages might not be popular, but they exist for a reason – you could also argue that the kind of applications where you would need those strict packages probably wouldn't get uploaded to Hackage)

Reasons not:

Not in Prelude, so it's an extra package
You're probably not writing high-performance code
Your program probably won't get faster just because you pushed the bang inward one level
If you were writing high-performance code, you could force evaluation on the thunk inside the Maybe manually anyway

Overall I don't think there's dogma going one way or another. It's just a matter of convenience.

votes

It's not as simple as "strict is fast, lazy is slow."

Both laziness and strictness are helpful for performance gains; let's see how laziness can be:

Obviously sum $ take 10 $ [1..] takes infinite time and infinite memory if the list is strict but finite time and constant memory if the list is lazy.
Functional data structures usually do not admit nice amortized bounds. What's an "amortized" bound? When you pay a cost of O(f(n)) only after O(n) other completely-unrelated steps, we can imaginatively reconceptualize this as paying O(f(n)/n) for each of those steps: so if you add, say, n elements to a list and then sort it once in n log n time, then you can reconceptualized as every addition taking log n time. (Which it does if you back it with a self-balancing binary search tree instead of a list, but we can say that even with a list the cost is log n amortized.)

The problem with combining this with functional programming is that there's a general promise that when I give you a new data structure, the old one is not modified, so that as a point of general theory, if it costs a lot to transform some X, then there exists a valid usage pattern which spends n effort to build X as before, but then uses it m times in different ways (because it's not modified!) incurring that O(f(n)) cost each time: so now when you try to amortize you only get O(m f(n)/n), and if the m were, say, to scale proportional to n, you've gone from incurring this cost once for the whole structure to incurring it basically once per addition to the data structure. You can't say "oh, that's not my usage case" when you're creating a data structure: even if it isn't yours, it is probably someone else's, eventually speaking.

Okasaki points out (in his thesis (PDF)) that laziness (with memoization) is actually exactly what we need to close this gap: suppose X has its postprocessed-version stored as a lazy value, then each call to transform X will hit the same lazy value and yield the same memoized answer. So: if you can cleverly move this stuff into the thunks, then the fact that Haskell doesn't recompute thunks can be used to make memoization arguments.
For another example, ++ in Haskell is an O(1) operation; with strict lists appending a list of size n onto the end of a list of size m requires O(m) memory allocations up-front, as the front list needs to be completely rebuilt; stream concatenation transforms this to O(m) conditional operations (which fortunately play really nice with the branch predictor in the processor!) and distributes this cost over every read of the list.
Laziness has a great potential if you're not using a bunch of the data, but you don't know which stuff you're using or not. To take a simple example, if you had to repeatedly invert some expensive monotonic function which was hard to predict on some bounded interval, you might not have a closed form for the inverse or a fast expression for the function or its derivative (so as to use Newton-Raphson). Instead, you could possibly build a big binary search tree of constant depth whose nodes are annotated with f(x) and whose leaves represent x; then you invert f for some input x by calculating f(x) and binary-searching for x. Each query will then get automatically memoized, so searching for values near other ones gets an asymptotic speedup due to the same caching (at the potential cost of ever-increasing memory).

So, when does strictness help?

The real case where you want to remove laziness is recursive data structures, and even then, this only applies if you know that you want the whole data structure to be available in memory (i.e. you're going to use all of it). Such data structures are usually spine-strict: e.g. a list which contains thunks for the actual values but the pointers to other list nodes are all 100% strict.

When these conditions are both true, then it makes no real sense to put laziness on each of these nodes as it provides an extra O(n) cost to evaluate all of these thunks and might blow up the call stack to your recursion limit if you don't use strictness annotations to keep it down. If you're not 100% clear on this, the best explanations I've seen for how this happens are ones like this, justifying the need for foldl' in cases where both foldr and foldl overflow the call stack for different reasons. Those explanations are usually very practical.

Strictness also can put a bunch of the cost up-front, like when you want to make a game: if you generate the game-world lazily, then you might notice "buffering" when you walk into a totally new area; but if you can generate these things strictly in advance, you have to pay an earlier cost but you get a later benefit. People don't mind waiting when they hit the "Load Game" button, they really hate waiting when it breaks immersion in some story. (Actually parallel laziness is truly ideal for this: you want to be able to force the thunk in the background, before you need it and while the action is a little lighter, so that the results are available by the time you want them. Even then, though -- I mean, that's how TES3: Morrowind worked, but they included a set of scrolls as a gag which allows you to jump halfway across the game-world if you can survive the landing -- and the speeds you'd get while doing that meant that you would fly through the regions faster than the system could load them, so it would constantly give you 3 seconds of midair flight before stopping for 2 to say "Loading...", over and over, as you crossed the game world this way. Nothing can really prevent that problem.)

When I need to fix it, how can I fix it?

So: we've learned that a typical Maybe somewhere is not going to create a sizeable cost for your application. That's why nobody cares.

What about when we create a recursive data structure, like the alternative list type data NonNullList x = NNL x !(Maybe (NonNullList x)), which must necessarily always have at least one element? In this case the recursion lives in the Maybe, and how do we fix that?

Yes, you can use a strict Maybe. However, you can also inline the structure to make it strict. In this case you would write:

data NonNullList x = End x | Continue x !(NonNullList x)

If there's too much repeated-info in your data structure (perhaps we are storing a lot of metadata in our data structure) and too many Maybe (MyDataStructure x) calls, then we might have to eventually have a data MyDataStructureDescriptor = MDSD { property1 :: !String, property2 :: !Int, ...} so that the many repetitions of this descriptor can be reduced to one common format. This can actually be really nice for organizing your code, too.