16
votes

I'm trying to understand clojure's lazy-seq operator, and the concept of lazy evaluation in general. I know the basic idea behind the concept: Evaluation of an expression is delayed until the value is needed.

In general, this is achievable in two ways:

  • at compile time using macros or special forms;
  • at runtime using lambda functions

With lazy evaluation techniques, it is possible to construct infinite data structures that are evaluated as consumed. These infinite sequences utilizes lambdas, closures and recursion. In clojure, these infinite data structures are generated using lazy-seq and cons forms.

I want to understand how lazy-seq does it's magic. I know it is actually a macro. Consider the following example.

(defn rep [n]
  (lazy-seq (cons n (rep n))))

Here, the rep function returns a lazily-evaluated sequence of type LazySeq, which now can be transformed and consumed (thus evaluated) using the sequence API. This API provides functions take, map, filter and reduce.

In the expanded form, we can see how lambda is utilized to store the recipe for the cell without evaluating it immediately.

(defn rep [n]
  (new clojure.lang.LazySeq (fn* [] (cons n (rep n))))) 
  • But how does the sequence API actually work with LazySeq?
  • What actually happens in the following expression?

(reduce + (take 3 (map inc (rep 5))))

  • How is the intermediate operation map applied to the sequence,
  • how does take limit the sequence and
  • how does terminal operation reduce evaluate the sequence?

Also, how do these functions work with either a Vector or a LazySeq?

Also, is it possible to generate nested infinite data structures?: list containing lists, containing lists, containing lists... going infinitely wide and deep, evaluated as consumed with the sequence API?

And last question, is there any practical difference between this

(defn rep [n]
  (lazy-seq (cons n (rep n))))

and this?

(defn rep [n]
  (cons n (lazy-seq (rep n))))
1

1 Answers

25
votes

That's a lot of questions!

How does the seq API actually works with LazySeq?

If you take a look at LazySeq's class source code you will notice that it implements ISeq interface providing methods like first, more and next.

Functions like map, take and filter are built using lazy-seq (they produce lazy sequences) and first and rest (which in turn uses more) and that's how they can work with lazy seq as their input collection - by using first and more implementations of LazySeq class.

What actually happens in the following expression?

(reduce + (take 3 (map inc (rep 5))))

The key is to look how LazySeq.first works. It will invoke the wrapped function to obtain and memoize the result. In your case it will be the following code:

(cons n (rep n))

Thus it will be a cons cell with n as its value and another LazySeq instance (result of a recursive call to rep) as its rest part. It will become the realised value of this LazySeq object and first will return the value of the cached cons cell.

When you call more on it, it will in the same way ensure that the value of the particular LazySeq object is realised (or reused memoized value) and call more on it (in this case more on the cons cell containing another LazySeq object).

Once you obtain another instance of LazySeq object with more the story repeats when you call first on it.

map and take will create another lazy-seq that will call first and more of the collection passed as their argument (just another lazy seq) so it will be similar story. The difference will be only in how the values passed to cons are generated (e.g. calling f to a value obtained by first invoked on the LazySeq value mapped over in map instead of a raw value like n in your rep function).

With reduce it's a bit simpler as it will use loop with first and more to iterate over the input lazy seq and apply the reducing function to produce the final result.

As the actual implementation looks like for map and take I encourage you to check their source code - it's quite easy to follow.

How seq API can works with different collection types (e.g. lazy seq and persistent vector)?

As mentioned above, map, take and other functions work in terms of first and rest (reminder - rest is implemented on top of more). Thus we need to explain how first and rest/more can work with different collection types: they check if the collection implements ISeq (and then it implement those functions directly) or they try to create a seq view of the collection and coll its implementation of first and more.

Is it possible to generate nested infinite data structures?

It's definitely possible but I am not sure what the exact data shape you would like to get. Do you mean getting a lazy seq which generates another sequence as it's value (instead of a single value like n in your rep) but returns it as a flat sequence?

(defn nested-cons [n]
  (lazy-seq (cons (repeat n n) (nested-cons (inc n)))))

(take 3 (nested-cons 1))

;; => ((1) (2 2) (3 3 3))

that would rather return (1 2 2 3 3 3)?

For such cases you might use concat instead of cons which creates a lazy sequence of two or more sequences:

(defn nested-concat [n]
  (lazy-seq (concat (repeat n n) (nested-concat (inc n)))))

(take 6 (nested-concat 1))

;; => (1 2 2 3 3 3)

Is there any practical difference with this

(defn rep [n]
  (lazy-seq (cons n (rep n))))

and this?

(defn rep [n]
  (cons n (lazy-seq (rep n))))

In this particular case not really. But in the case where a cons cell doesn't wrap a raw value but a result of a function call to calculate it, the latter form is not fully lazy. For example:

(defn calculate-sth [n]
  (println "Calculating" n)
  n)

(defn rep1 [n]
  (lazy-seq (cons (calculate-sth n) (rep1 (inc n)))))

(defn rep2 [n]
  (cons (calculate-sth n) (lazy-seq (rep2 (inc n)))))

(take 0 (rep1 1))
;; => ()

(take 0 (rep2 1))
;; Prints: Calculating 1
;; => ()

Thus the latter form will evaluate its first element even if you might not need it.