3
votes

I need to return a sequence, a number and a hash-map from my function (all wrapped in a vector) so that the printed return value looks like this:

[ ([:c :a] [:e :c] [:f :e] [:d :e] [:g :f] [:b :a])  15
  {:g :c, :f :a, :c :e, :d :a, :b :a, :c :a} ]

Since my inputs could be large, I'd like to return lazy sequences/objects from my function. The sequence of pairs (the first object in my return vector) was easy enough to make lazy by wrapping 'lazy-seq' around the conj calls that build it up.

The hash-map (3rd object in my return vector and potentially very large like my sequence) is being built-up (using assoc calls) in the same loop-recur block as the sequence. The hash-map is additional info that some of my callers will use but if the pairs-sequence is returned as lazy then I'm wondering if it makes sense to send back a potentially huge hash-map with (an efficient) lazy-seq even if I make it an optional return-value. The entries in the hash-map are related to the pairs in the lazy-sequence.

So here is my noobie question: Is there any sense in sending back a lazy-sequence of MapEntry's in place of a large HashMap? That is, assuming a user would take a chunk of the lazy-seq of MapEntrys, convert them to hashmap to do a lookup..failing which they'd take the next chunk and so on. Is this a sensible way to lazily use associative-data? Are there some idiomatic ways to return/manage large associative data in Clojure? Would appreciate any ideas as to what my options are. Thanks in advance for your help.

3

3 Answers

6
votes

No, giving them a lazy map is not possible. A lazy sequence of MapEntries is possible, but not very useful. There are a number of other options that might make sense that are similar, though.

  • You say the caller might not need the map at all: so return a delay of a map, which they can force if they need it.
  • If the keys are cheap to compute, but the values are expensive, you can return a full map with the correct keys, and values which are each delays; the caller can force only the values they need.

You can still return a lazy-seq of vectors (I wouldn't bother with making them MapEntries), but there's basically no way the caller will be able to treat this as a lazy map. Either they only want to look up a fixed set of already-known keys (in which case they'll just lazily filter over the entries, never making it a map) or they want to look up entries arbitrarily, in which case they'll have to keep all the entries in memory after looking up the first entry, so that they can still look up the second, and so they might as well just dump the whole thing into a fully-realized map.

1
votes

No, Clojure does not have lazy maps.

Also, if you are building up a sequence using loop/recur, I don't believe that trying to make it lazy accomplishes anything (unless generating each element is slow).

Look at these two functions:

(defn bad-lazy-range [begin end]
  (loop [i (dec end) lst nil]
    (if (>= i begin)
      (recur (dec i) (lazy-seq (cons i lst)))
      lst)))

(defn good-lazy-range [begin end]
  (if (>= begin end)
    nil
    (lazy-seq (cons begin (good-lazy-range (inc begin) end)))))

bad-lazy-range will recur begin-end times, generating a thunk (a lazy sequence link) each time, and then return the outermost thunk. This thunk needs to keep the reference to the next thunk, which needs a reference to the third thunk, etc. You do all the work immediately and generate a pseudo-linked list of thunks which takes up more space than a normal list would.

good-lazy-range, however, returns immediately without recursing more -- the recursive call is hidden inside the thunk and won't be evaluated until necessary. This also prevents a stack overflow exception -- without the lazy-seq call, it could generate a stack overflow exception, but at each step, it evaluates one call to good-lazy-range and returns. The caller can then evaluate the next call, but at this point, the stack frame from the first call is long gone.

In general, only use lazy-seq if you can wrap it around a significant amount of computation. In the first function, it is only wrapped around a call to cons, which would return quickly anyway. In the second function, however, it is wrapped around a call to cons and the recursive call, which means that it is delaying a worthwhile amount of computation.

If your code uses lazyness correctly and uses loop/recur, please post it -- I would be interested to see how you did it.

0
votes

From the example you gave why return the map at all caller can just build the map from the seq using (into {} s)