7
votes

I am quite new to Clojure, although I am familiar with functional languages, mainly Scala.

I am trying to figure out what is the idiomatic way to operate on collections in Clojure. I am particularly confused by the behaviour of functions such as map.

In Scala, a great care is taken in making so that map will always return a collection of the same type of the original collection, as long as this makes sense:

List(1, 2, 3) map (2 *) == List(2, 4, 6)
Set(1, 2, 3) map (2 *) == Set(2, 4, 6)
Vector(1, 2, 3) map (2 *) == Vector(2, 4, 6)

Instead, in Clojure, as far as I understand, most operations such as map or filter are lazy, even when invoked on eager data-structures. This has the weird result of making

(map #(* 2 %) [1 2 3])

a lazy-list instead of a vector.

While I prefer, in general, lazy operations, I find the above confusing. In fact, vectors guarantee certain performance characteristics that lists do not.

Say I use the result from above and append on its end. If I understand correctly, the result is not evaluated until I try to append on it, then it is evaluated and I get a list instead of a vector; so I have to traverse it to append on the end. Of course I could turn it into a vector afterwards, but this gets messy and can be overlooked.

If I understand correctly, map is polymorphic and it would not be a problem to implement is so that it returns a vector on vectors, a list on lists, a stream on streams (this time with lazy semantics) and so on. I think I am missing something about the basic design of Clojure and its idioms.

What is the reason basic operations on clojure data structures do not preverse the structure?

1
Look at the source code for map. Map doesn't care about the type of the collection. You could build a macro on top of map that remembered the type of the collection, and converted the collection to that type at the end. github.com/clojure/clojure/blob/master/src/clj/clojure/core/…Diego Basch
Take a look at clojure.algo.generic.functor/fmap in github.com/clojure/algo.generic for a map implementation that preserves the input type.Alex

1 Answers

7
votes

In Clojure many functions are based on the Seq abstraction. The benefit of this approach is that you don't have to write a function for every different collection type - as long as your collection can be viewed as a sequence (things with a head and possibly a tail), you can use it with all of the seq functions. Functions that take seqs and output seqs are much more composable and thus re-usable than functions that limit their use to a certain collection type. When writing your own function on seq you don't need to handle special cases like: if the user gives me a vector, I have to return a vector, etc. Your function will fit in just as good inside the seq pipeline as any other seq function.

The reason that map returns a lazy seq is a design choice. In Clojure lazyness is the default for many of these functional constructions. If you want to have other behavior, like parallelism without intermediate collections, take a look at the reducers library: http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html

As far as performance goes, map always has to apply a function n times on a collection, from the first to the last element, so its performance will always be O(n) or worse. In this case vector or list makes no difference. The possible benefit that laziness would give you is when you would only consume the first part of the list. If you must append something to the end of map's output, a vector is indeed more efficient. You can use mapv (added in Clojure 1.4) in this case: it takes in a collection and will output a vector. I would say, only worry about these performance optimizations if you have a very good reason for it. Most of the time it's not worth it.

Read more about the seq abstraction here: http://clojure.org/sequences

Another vector-returning higher order function that was added in Clojure 1.4 is filterv.