I was under the impression that the lazy seqs were always chunked.
=> (take 1 (map #(do (print \.) %) (range)))
(................................0)
As expected 32 dots are printed because the lazy seq returned by range
is chunked into 32 element chunks. However, when instead of range
I try this with my own function get-rss-feeds
, the lazy seq is no longer chunked:
=> (take 1 (map #(do (print \.) %) (get-rss-feeds r)))
(."http://wholehealthsource.blogspot.com/feeds/posts/default")
Only one dot is printed, so I guess the lazy-seq returned by get-rss-feeds
is not chunked. Indeed:
=> (chunked-seq? (seq (range)))
true
=> (chunked-seq? (seq (get-rss-feeds r)))
false
Here is the source for get-rss-feeds
:
(defn get-rss-feeds
"returns a lazy seq of urls of all feeds; takes an html-resource from the enlive library"
[hr]
(map #(:href (:attrs %))
(filter #(rss-feed? (:type (:attrs %))) (html/select hr [:link])))
So it appears that chunkiness depends on how the lazy seq is produced. I peeked at the source for the function range
and there are hints of it being implemented in a "chunky" manner. So I'm a bit confused as to how this works. Can someone please clarify?
Here's why I need to know.
I have to following code: (get-rss-entry (get-rss-feeds h-res) url)
The call to get-rss-feeds
returns a lazy sequence of URLs of feeds that I need to examine.
The call to get-rss-entry
looks for a particular entry (whose :link field matches the second argument of get-rss-entry). It examines the lazy sequence returned by get-rss-feeds
. Evaluating each item requires an http request across the network to fetch a new rss feed. To minimize the number of http requests it's important to examine the sequence one-by-one and stop as soon as there is a match.
Here is the code:
(defn get-rss-entry
[feeds url]
(ffirst (drop-while empty? (map #(entry-with-url % url) feeds))))
entry-with-url
returns a lazy sequence of matches or an empty sequence if there is no match.
I tested this and it seems to work correctly (evaluating one feed url at a time). But I am worried that somewhere, somehow it will start behaving in a "chunky" way and it will start evaluating 32 feeds at a time. I know there is a way to avoid chunky behavior as discussed here, but it doesn't seem to even be required in this case.
Am I using lazy seq non-idiomatically? Would loop/recur be a better option?
clojure.core
and/or your sequence implements theIChunk
andIChunkedSeq
interfaces. Currently (in 1.4.0), these are undocumented. – noahlz