In this SO thread, I learned that keeping a reference to a seq on a large collection will prevent the entire collection from being garbage-collected.
First, that thread is from 2009. Is this still true in "modern" Clojure (v1.4.0 or v1.5.0)?
Second, does this issue also apply to lazy sequences? For example, would (def s (drop 999 (seq (range 1000)))) allow the garbage collector to retire the first 999 elements of the sequence?
Lastly, is there a good way around this issue for large collections? In other words, if I had a vector of, say, 10 million elements, could I consume the vector in such a way that the consumed parts could be garbage collected? What about if I had a hashmap with 10 million elements?
The reason I ask is that I'm operating on fairly large data sets, and I am having to be more careful not to retain references to objects, so that the objects I don't need can be garbage collected. As it is, I'm encountering a java.lang.OutOfMemoryError: GC overhead limit exceeded error in some cases.
(drop 999990 (vec (range 1000000)))is due to the intervening vector and the behavior ofsubvectoring. I don't suspect a lazyconsed sequence would do this. If you need to release a vector while retaining a subvector, you can copy the subvectorintoa new vector. Very interesting question though, I'm waiting to see the answers too! - A. Webb