Why does foldl seems to be harmful despite being tail-recursive?

Question

I've always believed that tail-recursive functions are better in terms of performance than non tail-recursive versions. So, counting items in a list might be implemented like so:

count:: [a] -> Int
count [] = 0
count (x:xs) = 1 + count xs

But this function is not tail recursive, and so is not as performant as possible. The fix is to accumulate counts, like so:

_count:: Num b => b -> [a] -> b
_count b [] = b
_count b (x:xs) = _count (b + 1) xs

count:: [a] -> Int
count = _count 0

This can be easily implemented with a tail-recursive fold:

myfold:: (b -> a -> b) -> b -> [a] -> b
myfold f b [] = b
myfold f b (x:xs) = myfold f (f b x) xs

count = myfold incr 0
   where incr c _ = c + 1

But, then I remembered something about left vs right folds. It turned out myfold is a left fold, which according to Real World Haskell shouldn't be used:

This is convenient for testing, but we will never use foldl in practice.

...because of the thunking of the application of f b x.

So, I tried to rewrite myfold as a right fold:

myfoldr:: (a -> b -> b) -> b -> [a] -> b
myfoldr f b [] = b
myfoldr f b (x:xs) = f x (myfoldr f b xs)

But that's not tail-recursive.

It seems, then, that Haskell non-strict evaluation makes tail-recursiveness less important. Yet, I have this feeling that for counting items in lists a strict foldl should perform better than any foldr, because there's no way we can extract anything from an Integer.

To sum up, I think these are the better implementations (using folds) for map and count:

map:: (a -> b) -> [a] -> [b]
map f = foldr g []
  where g x fxs = (f x):fxs

count:: [a] -> Int
count = foldl incr 0
  where incr c _ = c + 1

Is this correct?

IIRC, the reason that RWH states that we'll never use foldl in practice is because you should use foldl' instead. — Mark Seemann
If you are using a left fold to produce a lazy data structure (which, in particular, you need to consume lazily, typically because it's too large to manifest in memory) and you need to do some computation in between each step which you want to perform immediately - not when the result is actually evaluated - then using foldl is appropriate. But these are very restrictive conditions, so in practice, foldl is never used. — user2407038
A rough guideline is: if your fold returns some type which is represented in constant-space (e.g. Int), prefer foldl' so that the whole fold requires constant space. Otherwise, if the return type is a lazy list/tree/whatever, prefer foldr, so that it can be consumed lazily. — chi

Joachim Breitner Joachim Breitner · Accepted Answer · 2018-01-22T14:07:05

It seems, then, that Haskell non-strict evaluation makes tail-recursiveness less important. Yet, I have this feeling that for counting items in lists a strict foldl should perform better than any foldr, because there's no way we can extract anything from an Integer.

That is correct, and tail-calls are more efficient. But this benefit can be outweighed by the cost of creating large thunks, and this is the case for foldl.

The way to have your cake and eat it too is to make sure that the accumulator is not thunked, but rather eagerly evaluated:

myfold:: (b -> a -> b) -> b -> [a] -> b
myfold f !b [] = b
myfold f !b (x:xs) = myfold f (f b x) xs

Which is, of course, the foldl' function.

TL;DR: Never use foldl, but do use foldl'.

Why does foldl seems to be harmful despite being tail-recursive?

1 Answers