I'm trying to use create a Clojure seq from some iterative Java library code that I inherited. Basically what the Java code does is read records from a file using a parser, sends those records to a processor and returns an ArrayList of result. In Java this is done by calling parser.readData(), then parser.getRecord() to get a record then passing that record into processor.processRecord(). Each call to parser.readData() returns a single record or null if there are no more records. Pretty common pattern in Java.
So I created this next-record function in Clojure that will get the next record from a parser.
(defn next-record
"Get the next record from the parser and process it."
[parser processor]
(let [datamap (.readData parser)
row (.getRecord parser datamap)]
(if (nil? row)
nil
(.processRecord processor row 100))))
The idea then is to call this function and accumulate the records into a Clojure seq (preferably a lazy seq). So here is my first attempt which works great as long as there aren't too many records:
(defn datamap-seq
"Returns a lazy seq of the records using the given parser and processor"
[parser processor]
(lazy-seq
(when-let [records (next-record parser processor)]
(cons records (datamap-seq parser processor)))))
I can create a parser and processor, and do something like (take 5 (datamap-seq parser processor)) which gives me a lazy seq. And as expected getting the (first) of that seq only realizes one element, doing count realizes all of them, etc. Just the behavior I would expect from a lazy seq.
Of course when there are a lot of records I end up with a StackOverflowException. So my next attempt was to use loop-recur to do the same thing.
(defn datamap-seq
"Returns a lazy seq of the records using the given parser and processor"
[parser processor]
(lazy-seq
(loop [records (seq '())]
(if-let [record (next-record parser processor)]
(recur (cons record records))
records))))
Now using this the same way and defing it using (def results (datamap-seq parser processor)) gives me a lazy seq and doesn't realize any elements. However, as soon as I do anything else like (first results) it forces the realization of the entire seq.
Can anyone help me understand where I'm going wrong in the second function using loop-recur that causes it to realize the entire thing?
UPDATE:
I've looked a little closer at the stack trace from the exception and the stack overflow exception is being thrown from one of the Java classes. BUT it only happens when I have the datamap-seq function like this (the one I posted above actually does work):
(defn datamap-seq
"Returns a lazy seq of the records using the given parser and processor"
[parser processor]
(lazy-seq
(when-let [records (next-record parser processor)]
(cons records (remove empty? (datamap-seq parser processor))))))
I don't really understand why that remove causes problems, but when I take it out of this funciton it all works right (I'm doing the removal of empty lists somewhere else now).