4
votes

If I call the following Haskell code

find_first_occurrence :: (Eq a) => a -> [a] -> Int
find_first_occurrence elem list = (snd . head) [x | x <- zip list [0..], fst x == elem]

with the arguments

'X' "abcdXkjdkljklfjdlfksjdljjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj"

how much of the zipped list [('a',0), ('b',1), ] is going to be built?

UPDATE:

I tried to run

find_first_occurrence 10 [1..]

and returns 9 almost instantly, so I guess it does use lazy evaluation at least for simple cases? The answer is also computed "instantly" when I run

let f n = 100 - n
find_first_occurrence 10 (map f [1..])
3
Correction: second expression should have 'X' in front, not 'x'.quant_dev
Re "I guess it does use lazy evaluation at least for simple cases": Haskell uses lazy evaluation for all cases; the only exception is in the presence of seq.Daniel Wagner
@Daniel Wagner, another exception is pattern matching, right?Rotsor
@Rotsor: I'm not sure about it, Haskell in general may be able to pattern match an expression without actually fully evaluating it. case zip [1..] [2..] of (a,b):_ -> True evaluates to True, because we only need the first element of the list to match with (a,b):_.Riccardo T.
@Rotsor Pattern-matching is what drives evaluation, but saying that Haskell doesn't use laziness when doing pattern matching would be very misleading. Whether you match foo against the pattern x, or _:_, or (_:_):_, the runtime won't evaluate foo any farther than it needs to to decide which pattern matches.Daniel Wagner

3 Answers

6
votes

Short answer: it will be built only up to the element you're searching for. This means that only in the worst case you'll need to build the whole list, that is when no element satisfies the conditions.

Long answer: let me explain why with a pair of examples:

ghci> head [a | (a,b) <- zip [1..] [1..], a > 10]
11

In this case, zip should produce an infinite list, however the laziness enables Haskell to build it only up to (11,11): as you can see, the execution does not diverge but actually gives us the correct answer.

Now, let me consider another issue:

ghci> find_first_occurrence 1 [0, 0, 1 `div` 0, 1]
*** Exception: divide by zero
ghci> find_first_occurrence 1 [0, 1, 1 `div` 0, 0]
1
it :: Int
(0.02 secs, 1577136 bytes)

Since the whole zipped list is not built, haskell obviously will not even evaluate each expression occurring in the list, so when the element is before div 1 0, the function is correctly evaluated without raising exceptions: the division by zero did not occur.

5
votes

All of it.

Since StackOverflow won't let me post such a short answer: you can't get away with doing less work than looking through the whole list if the thing you're looking for isn't there.

Edit: The question now asks something much more interesting. The short answer is that we will build the list:

('a',0):('b',1):('c',2):('d',3):('X',4):<thunk>

(Actually, this answer is just the slightest bit subtle. Your type signature uses the monomorphic return type Int, which is strict in basically all operations, so all the numbers in the tuples above will be fully evaluated. There are certainly implementations of Num for which you would get something with more thunks, though.)

5
votes

You can easily answer such a question by introducing undefineds here and there. In our case it is sufficient to change our inputs:

find_first_occurrence 'X' ("abcdX" ++ undefined)

You can see that it produces the result, which means that it does not even look beyond the 'X' it found (otherwise it would have thrown an Exception). Obviously, the zipped list can not be built without looking at the original list.

Another (possibly less reliable) way to analyse your laziness is to use trace function from Debug.Trace:

> let find_first_occurrence elem list = (snd . head) [x | x <- map (\i -> trace (show i) i) $ zip list [0..], fst x == elem]
> find_first_occurrence 'X' "abcdXkjdkljklfjdlfksjdljjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj"

Prints

('a',0)
('b',1)
('c',2)
('d',3)
('X',4)
4