1
votes

I'm trying to pipe the output of the parts-of-speech function into the index-words function and print the resulting output using the (->) thread macro:

(defn parts-of-speech []
  (seq (. POS values)))

(defn index-words [pos]
  (iterator-seq (. dict getIndexWordIterator pos)))

(-> (parts-of-speech) index-words println)

But the index-words func returns an iterator-seq, and I'm not sure how to iterate it in this context because I'm new to Clojure.

EDIT: Updated code per suggestions.

UPDATE:

Thanks to answers from @kotarak and @jayunit100 and comments from @sw1nn and @marko-topolnik, I have at least two variations that work:

(->> (parts-of-speech) (map index-words) (map println) doall)

(doseq [w (map index-words (parts-of-speech))]
  (println w))

I'm coming from an imperative background, and my goal with this question is to understand the thread macro in an attempt to write more idiomatic Clojure (before experimenting with the thread macro, I was looping over each sequence using multiple doseq and lets).

From the comments, it appears that the thread macro may not be the most idiomatic way to do this, but I still would like to see how to make it work so I can fill this gap in understanding.

Also, (parts-of-speech) returns a sequence of four items, and if you do a (println (count w)) instead of (println w), you can see it prints the count of four sequences rather than one continuous sequence:

(doseq [w (map index-words (parts-of-speech))]
  (println (count w)))

;= 117798
;= 11529
;= 21479
;= 4481

How would you modify the above to print one continuous stream of words instead of printing the contents of four sequences?

BTW: The above code is wrapping the MIT Java WordNet library (http://projects.csail.mit.edu/jwi/).

2
Is your question still open? This looks like correct code. Doesn't it print the sequence? To answer your comment below: in clojure you don't use iterators. Seq iteration is achieved with doseq. So if you wish, you can say (doseq [w (index-words (parts-of-speech)] (println w)).Marko Topolnik
BTW iterating over a native Java array is the same as iterating over anything else in clojure -- everything seqable, that is.Marko Topolnik

2 Answers

6
votes

The relationship between seqs and iterator-seq is as follows : a iterator-seq CREATES a seq from an iterator.

Forgive the verbosity here, but to answer the question of "how to iterate over the output of the iterator-seq", we have to first clearly define why it is that you needed to call iterator-seq to begin with :

In Clojure, you won't find yourself needing to create iterator-seq objects too often. Since clojure can handle iteration over "Iterable" java objects quite handily (see : http://clojuredocs.org/clojure_core/clojure.core/iterator-seq). However, iterators themselves are not iterable.
To fully understand this, you need to get into the difference between Iterables and Iterators, which is largely due to keep semantics consistent and straightforward in the Java world : Why is Java's Iterator not an Iterable?.

So what is a 'seq' ?

In clojure there is a higher abstraction than java's Iterator interface, which is that of the ISeq. The iterator-seq creates an ISeq for us under-the-hood. This ISeq object can now be used by the many Clojure functions that operate against sequential lists of items.

user=> (iterator-seq (.iterator (new java.util.ArrayList ["A" "B"])))
("A" "B")
;Thus, we now have an ISeq implementation derived from an iterator.  

Thus, your "iterator-seq" function is creating a Clojure "sequence" for you that is from a java iterator. To clarify -- the error message when we call "iterator-seq" on a non iterable object is informative :

user=> (iterator-seq "ASDF")                                         
java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Iterator (NO_SOURCE_FILE:0)

This tells us that the "iterator-seq" function REQUIRES a java.util.Iterator as input.

The next logical question you might have is :

Why do we need to create sequences from iterators? How is the seq abstraction different from the iterator abstraction in java ?

The Iterable interface is not quite as abstract as Clojure's ISeq. For example, consider Strings. Clearly, strings are sequential. Yet, they are not iterable in Java. The same goes for arrays.

From the clojure website :

"seq works on Java reference arrays, Iterables and Strings. Since much of the rest of the library is built upon these functions, there is great support for using Java objects in Clojure algorithms."

Thus, the purpose of your iterator-seq is to "wrap" your iterator object into a sequence abstraction which will be able to leverage all of clojures functional goodies.

Defining the role of iterator-seq

From http://clojure.org/sequences :

"The seq function yields an implementation of ISeq appropriate to the collection."

In your case, we can say that :

"The iterator-seq function yields an implementation of ISeq for your getIndexWordsIterator".

Finally : How can I iterate a seq ?

This question needs to be answered carefully, given the context.

Iteration is certainly possible - but is not a primary concern in clojure, and it might not really be what you are after. Since iterator-seq has already created a SEQ for us, now we can probably use that seq using one of Clojure's functional operators (i.e. in a list comprehension, a map function, etc...). This obviates the need for manual iteration.

For example, often, we iterate through a list to find a value. In clojure, we can find a value by using the filter function:

user=> (filter #(= \A %) (seq "ABCD"))   
(\A)

Rather than filtering, maybe we want to apply a function to several objects via iteration through each one, storing the results in a new collection. Again, this need-not be done via explicit iteration in Clojure :

user=> (map #(.hashCode %) (seq "ABCZ")) 
(65 66 67 90)

Finally, if you REALLY need to iterate manually through your collection , you can use the Loop-recur construct to manually, tail-recursively traverse your sequence, one element at a time : http://clojure.org/functional_programming#Functional%20Programming--Recursive%20Looping. Or you can use standard recursive calls.

2
votes

You actually have to call your function. At the moment pass in the function parts-of-speech to index-words.

(defn parts-of-speech
  []
  (.values POS))

(defn index-words
  [pos]
  (iterator-seq (.getIndexWordIterator dict pos)))

(-> (parts-of-speech) index-words println)

Note the parens around parts-of-speech. Also note that the interop syntax you use is quite ancient.