1
votes

I'm trying to use create a Clojure seq from some iterative Java library code that I inherited. Basically what the Java code does is read records from a file using a parser, sends those records to a processor and returns an ArrayList of result. In Java this is done by calling parser.readData(), then parser.getRecord() to get a record then passing that record into processor.processRecord(). Each call to parser.readData() returns a single record or null if there are no more records. Pretty common pattern in Java.

So I created this next-record function in Clojure that will get the next record from a parser.

(defn next-record
  "Get the next record from the parser and process it."
  [parser processor]
  (let [datamap (.readData parser)
        row (.getRecord parser datamap)]
    (if (nil? row)
    nil
    (.processRecord processor row 100))))

The idea then is to call this function and accumulate the records into a Clojure seq (preferably a lazy seq). So here is my first attempt which works great as long as there aren't too many records:

(defn datamap-seq
  "Returns a lazy seq of the records using the given parser and processor"
  [parser processor]
  (lazy-seq
    (when-let [records (next-record parser processor)]
      (cons records (datamap-seq parser processor)))))

I can create a parser and processor, and do something like (take 5 (datamap-seq parser processor)) which gives me a lazy seq. And as expected getting the (first) of that seq only realizes one element, doing count realizes all of them, etc. Just the behavior I would expect from a lazy seq.

Of course when there are a lot of records I end up with a StackOverflowException. So my next attempt was to use loop-recur to do the same thing.

(defn datamap-seq
  "Returns a lazy seq of the records using the given parser and processor"
  [parser processor]
  (lazy-seq
    (loop [records (seq '())]
      (if-let [record (next-record parser processor)]
        (recur (cons record records))
        records))))

Now using this the same way and defing it using (def results (datamap-seq parser processor)) gives me a lazy seq and doesn't realize any elements. However, as soon as I do anything else like (first results) it forces the realization of the entire seq.

Can anyone help me understand where I'm going wrong in the second function using loop-recur that causes it to realize the entire thing?

UPDATE:

I've looked a little closer at the stack trace from the exception and the stack overflow exception is being thrown from one of the Java classes. BUT it only happens when I have the datamap-seq function like this (the one I posted above actually does work):

(defn datamap-seq
  "Returns a lazy seq of the records using the given parser and processor"
  [parser processor]
  (lazy-seq
    (when-let [records (next-record parser processor)]
      (cons records (remove empty? (datamap-seq parser processor))))))

I don't really understand why that remove causes problems, but when I take it out of this funciton it all works right (I'm doing the removal of empty lists somewhere else now).

2

2 Answers

4
votes

loop/recur loops within the loop expression until the recursion runs out. adding a lazy-seq around it won't prevent that.

Your first attempt with lazy-seq / cons should already work as you want, without stack overflows. I can't spot right now what the problem with it is, though it might be in the java part of the code.

2
votes

I'll post here addition to Joost's answer. This code:

(defn integers [start]
  (lazy-seq 
    (cons
      start
      (integers (inc start)))))

will not throw StackOverflowExceptoin if I do something like this:

(take 5 (drop 1000000 (integers)))

EDIT:

Of course better way to do it would be to (iterate inc 0). :)

EDIT2:

I'll try to explain a little how lazy-seq works. lazy-seq is a macro that returns seq-like object. Combined with cons that doesn't realize its second argument until it is requested you get laziness.

Now take a look at how LazySeq class is implemented. LazySeq.sval triggers computation of the next value which returns another instance of "frozen" lazy sequence. Method LazySeq.seq even better shows mechanics behind the concept. Notice that to fully realize sequence it uses while loop. It in itself means that stack trace use is limited to short function calls that return another instances of LazySeq.

I hope this makes any sense. I described what I could deduce from the source code. Please let me know if I made any mistakes.