6
votes

Chapter 8 of The RealWorldHaskell

globToRegex' (c:cs) = escape c ++ globToRegex' cs

This function is not tail recursive and it says that the answer relies on Haskell non-strict(lazy) evaluation strategy. The (++) operator's simple definition lies in the following and it's not tail recursive.

(++) :: [a] -> [a] -> [a]

(x:xs) ++ ys = x : (xs ++ ys)
[]     ++ ys = ys

In a strict language, if we evaluate "foo" ++ "bar", the entire list is constructed, then returned. Non-strict evaluation defers much of the work until it is needed.

If we demand an element of the expression "foo" ++ "bar", the first pattern of the function's definition matches, and we return the expression x : (xs ++ ys). Because the (:) constructor is non-strict, the evaluation of xs ++ ys can be deferred: we generate more elements of the result at whatever rate they are demanded. When we generate more of the result, we will no longer be using x, so the garbage collector can reclaim it. Since we generate elements of the result on demand, and do not hold onto parts that we are done with, the compiler can evaluate our code in constant space.

(Emphasis added.)

The explanation above in bold is something essential to Haskell, But

  1. How can we comprehend that?
  2. What happend in the underlying?
  3. "x:(xs ++ ys) will evalute in constant space", how? It sounds what tail recursion does!
3
It sounds to me like the constant space comes from throwing away x when generating the next xs on demand.sarnold

3 Answers

8
votes

Remember that "foo" is just syntactic sugar for 'f':'o':'o':[].

That is, String is just an alias for [Char] which is just a linked list of characters.

When client code is consuming the linked list, it decomposes it back into a head and tail (e.g. x:xs), does something with the head (if desired), and then recurses for the tail.

When your code is constructing the linked list, because of lazy evaluation, all it needs to do is return a thunk or promise that it will return a linked list when asked for. When the head is dereferenced, it is supplied on demand, and the tail is left as a promise for the rest of the list.

It should be easy to see that as long as the list is not copied or otherwise stored, each thunk will get used once and then discarded, so that the overall storage space is constant.

Many strict languages expose a mechanism (often called a generator) to accomplish the same kind of lazy list generation, but with lazy evaluation such features come "for free" as part of the language -- in essence, all Haskell lists are generators!

7
votes

Relying on lazy evaluation rather than tail recursion is a characteristic of Haskell in comparison to other FP languages. The two play related roles in terms of limiting memory usage; which one is the appropriate mechanism depends on the data being produced.

If your output may be incrementally consumed, then you should prefer to take advantage of lazy evaluation, as output will only be generated as it is required, thus limiting heap consumption. If you eagerly construct the output, then you are resigning yourself to using heap, but can at least conserve stack by being tail recursive.

If your output can not be incrementally consumed -- perhaps you are computing an Int -- then lazyness can leave you with an unwanted pile of thunks whose evaluation will blow your stack. In this case, a strict accumulator and tail recursion are called for.

So, if you are eager you might waste heap building a big data structure. If you are lazy, you might defer simplification (e.g. reducing 1 + 1 to 2) to the heap, only to eventually sully your stack when paying the piper.

To play with both sides of the coin, ponder foldl' and foldr.

5
votes

Tail recursion would keep the stack constant, but in a strict language the heap would grow as x : (xs ++ ys) was computed. In Haskell, because it is non-strict, x would be freed before the next value was computed (unless the caller held a reference to x unnecessarily), so the heap would also be constant.