how does enumFromTo work in haskell and what optimizations speed up the GHC implementation vs naive implementations

Question

I'm learning haskell, and one of the exercises required that I write a function equivalent to enumFromTo.

I came up with the following two implementations:

eft' :: Enum a => a -> a -> [a]
eft' x y = go x y []
  where go a b sequence
          | fromEnum b < fromEnum a = sequence
          | otherwise = go (succ a) b (sequence ++ [a])

eft :: Enum a => a -> a -> [a]
eft x y = go x y []
  where go a b sequence
          | fromEnum b < fromEnum a = sequence
          | otherwise = go a (pred b) (b : sequence)

I had a hunch that the first version does more work, as it puts each element into a list and concatenates to the existing sequence, while the second version prepends a single element to a list. Is this the main reason for the performance difference or are there other significant factors or is my hunch slightly off the mark?

Testing in ghci with :set +s reveals on my machine (Windows 10, GHC 8.2.2, intel i7-4770HQ):

*Lists> take 10 (eft 1 10000000)
[1,2,3,4,5,6,7,8,9,10]
(9.77 secs, 3,761,292,096 bytes)
*Lists> take 10 (eft' 1 10000000)
[1,2,3,4,5,6,7,8,9,10]
(27.97 secs, 12,928,385,280 bytes)
*Lists> take 10 (enumFromTo 1 10000000)
[1,2,3,4,5,6,7,8,9,10]
(0.00 secs, 1,287,664 bytes)

My second hunch was that the take 10 (eft 1 10000000) should perform better than take 10 (eft' 10000000) because the latter has to build the list up all the way from 10000000 to 10 before it can return any useful values that we take. Clearly this hunch was wrong, and I'm hoping someone can explain why.

Finally, the ghc implementation is incredibly more efficient than my naive implementations. I am curious to understand other optimizations have been applied to speed it up. The answer to this similarly titled SO question shares some code that seems to be from the ghc implementation, but doesn't explain how the "nastiness" gains efficiency.

N.B.: While this doesn't compromise the relevance of this specific question, note that performance testing in GHCi isn't very realistic, as the code will be unoptimised. You'd usually want to test after compiling with -O2 (note that can be done within GHCi with the help of the -fobject-code option). — duplode
xs++[x] needs to scan the whole list xs, making a copy of it, so to add x at the end. Instead, x:xs does not scan or copy anything, and only requires O(1) time instead of O(N). — chi

duplode duplode · Accepted Answer · 2018-03-06T06:19:02

The problem with eft is that it still requires the entire list to be built, regardless of your attempt to cut it down with take 10. Tail recursion is not your friend when you want to build things lazily. What you want is guarded recursion (i.e. recursive calls right behind the relevant constructor, as in foldr, so that they can be left unevaluated when you don't need them):

eft'' :: Enum a => a -> a -> [a]
eft'' x y
    | fromEnum y < fromEnum x = []
    | otherwise = x : eft'' (succ x) y

GHCi> take 10 (eft 1 10000000)
[1,2,3,4,5,6,7,8,9,10]
(7.48 secs, 2,160,291,096 bytes)
GHCi> take 10 (eft'' 1 10000000)
[1,2,3,4,5,6,7,8,9,10]
(0.00 secs, 295,752 bytes)
GHCi> take 10 (enumFromTo 1 10000000)
[1,2,3,4,5,6,7,8,9,10]
(0.00 secs, 293,680 bytes)

As for eft' being worse than eft, that indeed has to do with (++). For reference, here are definitions for take and (++) (I'm using the Report definitions rather than the GHC ones, but the slight differences don't actually matter here):

take                   :: Int -> [a] -> [a]  
take n _      | n <= 0 =  []  
take _ []              =  []  
take n (x:xs)          =  x : take (n-1) xs 

(++) :: [a] -> [a] -> [a]  
[]     ++ ys = ys  
(x:xs) ++ ys = x : (xs ++ ys)

If you hand-evaluate eft, you get to see how it has to build the entire list before giving you any element:

take 3 (eft 1 5)
take 3 (go 1 5 [])
take 3 (go 1 4 (5 : []))
take 3 (go 1 3 (4 : 5 : []))
-- etc.
take 3 (1 : 2 : 3 : 4 : 5 : [])
1 : take 2 (2 : 3 : 4 : 5 : [])
1 : 2 : take 1 (3 : 4 : 5 : [])
-- etc.

At least, though, once you get past the gos the list is ready for consumption. That is not the case with eft' -- the (++) still have to be dealt with, and doing so is linear with respect to the length of the list:

take 3 (eft' 1 5)
take 3 (go 1 5 [])
take 3 (go 2 5 ([] ++ [1]))
take 3 (go 3 5 (([] ++ [1]) ++ [2]))
-- etc.
take 3 ((((([] ++ [1]) ++ [2]) ++ [3]) ++ [4]) ++ [5])
take 3 (((([1] ++ [2]) ++ [3]) ++ [4]) ++ [5])
take 3 ((((1 : ([] ++ [2])) ++ [3]) ++ [4]) ++ [5])
take 3 ((((1 : [2]) ++ [3]) ++ [4]) ++ [5])
take 3 (((1 : ([2] ++ [3])) ++ [4]) ++ [5])
-- etc.
take 3 (1 : ((([2] ++ [3]) ++ [4]) ++ [5]))
1 : take 2 ((([2] ++ [3]) ++ [4]) ++ [5])

It gets worse: you have to do it again with the remaining tail of the list for every single element!

1 : take 2 ((([2] ++ [3]) ++ [4]) ++ [5])
1 : take 2 (((2 : ([] ++ [3])) ++ [4]) ++ [5])
1 : take 2 (((2 : [3]) ++ [4]) ++ [5])
1 : take 2 ((2 : ([3] ++ [4])) ++ [5])
-- etc.
1 : take 2 (2 : (([3] ++ [4]) ++ [5]))
1 : 2 : take 1 (([3] ++ [4]) ++ [5])
-- etc.

In fact, the take 10 disguises the fact that eft', unlike eft, is quadratic:

GHCi> last $ eft' 1 10000
10000
(1.83 secs, 4,297,217,200 bytes)
GHCi> last $ eft' 1 20000
20000
(7.59 secs, 17,516,804,952 bytes)

GHCi> last $ eft 1 5000000
5000000
(3.81 secs, 1,080,282,784 bytes)
GHCi> last $ eft 1 10000000
10000000
(7.51 secs, 2,160,279,232 bytes)

For the sake of completeness, here is the corresponding hand-evaluation for eft'':

take 3 (eft'' 1 5)
take 3 (1 : eft'' 2 5)
1 : take 2 (eft'' 2 5) -- No need to evaluate `eft'' 2 5` to get the first element.
1 : take 2 (2 : eft'' 3 5)
1 : 2 : take 1 (eft'' 3 5)
-- etc.
1 : 2 : 3 : take 0 (eft'' 4 5) -- No need to go further.
1 : 2 : 3 : []

how does enumFromTo work in haskell and what optimizations speed up the GHC implementation vs naive implementations

1 Answers