34
votes

There are other questions such as Scala: What is the difference between Traversable and Iterable traits in Scala collections? and How would I get the sum of squares of two Lists in Scala? that answers the question partially. I felt a question that covers all of this in one place makes sense.

2
I'm voting to close 1) because this is not really a specific question but far to general, 2) because the information is likely to grow stale as Scala evolves and 3) because the general case is already well-defined by Scala's own documentation, which is probably a better resource for such broad investigations (and Scala, unlike many other languages, has quite extensive, if sometimes quite challenging documentation).ig0774
@ig0774 2)In my opinion, it is too core concepts of scala collections to be changed significantly in observable future.om-nom-nom
@ig0774 In contrast to your three points, I upvoted rather than vote to close because 1) I feel this question is generally valuable, and tangible enough to be answered succinctly, 2) because it lies near the heart of understanding the current Scala collections library, and 3) the current documentation, though extensive, is very spread out; it's hard to get the big picture.Dan Burton
@ig0774 thanks for the link. I wasn't aware of this documentation scala-lang.org/docu/files/collections-api/collections.html. Good read. But, I think it does make sense this question to be here for a beginner looking for it in so.smartnut007
@om-nom (last week) theoretical discussion of exactly this: groups.google.com/group/scala-internals/browse_thread/thread/…Gene T

2 Answers

34
votes

Traversable is the top of the collections hierarchy. Its main method is 'foreach' so it allows to do something for each element of the collection.

An Iterable can create an Iterator, based on which foreach can be implemented. This defines some order of the elements, although that order might change for every Iterator.

Seq(uence) is an Iterable where the order of elements is fixed. Therefore it makes sense to talk about the index of an element.

Streams are lazy Sequences. I.e. elements of a stream may not be computed before they are accessed. This makes it possible to work with infinite sequences like the sequence of all integers.

Views are non-strict versions of collections. Methods like filter and map on view only execute the passed functions when the respective element gets accessed. Thus a map on a huge collection returns immediately because it just creates a wrapper around the original collection. Only when one accesses an element, the mapping gets actually executed (for that element). Note that View is not a class, but there are lots of XxxView classes for various collections.

3
votes

One comment I'd like to add about streams vs. iterators. Both streams and iterators can be used to implement long, non-strict, potentially infinite collections that don't compute a value until it's needed.

However, there is a tricky problem with "premature execution" that arises when doing this, which can be avoided using iterators but not streams, and in the process points out an important semantic difference between the two. This is perhaps illustrated most clearly as follows:

def runiter(start: Int) {
  // Create a stream that returns successive integers on demand, e.g. 3, 4, 5, ....
  val iter = {
    def loop(v: Int): Stream[Int] = { println("I computed a value", v); v} #:: loop(v+1)
    loop(start)
  }
  // Now, sometime later, we retrieve the values ....
  println("about to loop")
  for (x <- iter) {
    if (x < 10) println("saw value", x) else return
  }
}

This code creates an infinite stream that starts at a given value and returns successive integers. It is used as a stand-in for more complex code that might, for example, open an internet connection and return values from the connection as needed.

Result:

scala> runiter(3)
(I computed a value,3)
about to loop
(saw value,3)
(I computed a value,4)
(saw value,4)
(I computed a value,5)
(saw value,5)
(I computed a value,6)
(saw value,6)
(I computed a value,7)
(saw value,7)
(I computed a value,8)
(saw value,8)
(I computed a value,9)
(saw value,9)
(I computed a value,10)

Note carefully how the execution required to compute the first value occurs BEFORE the place where the stream's values are actually used. If this initial execution involves, for example, opening a file or internet connection and there's a long delay after creating the stream and before any of the values are used, this can be very problematic -- you will end up with an open file descriptor sitting around, and worse, your internet connection might time out, causing the whole thing to fail.

A simple attempt to fix it using an initial empty stream doesn't work:

def runiter(start: Int) {
  // Create a stream that returns successive integers on demand, e.g. 3, 4, 5, ....
  val iter = {
    def loop(v: Int): Stream[Int] = { println("I computed a value", v); v} #:: loop(v+1)
    Stream[Int]() ++ loop(start)
  }
  // Now, sometime later, we retrieve the values ....
  println("about to loop")
  for (x <- iter) {
    if (x < 10) println("saw value", x) else return
  }
}

Result (same as before):

scala> runiter(3)
(I computed a value,3)
about to loop
(saw value,3)
(I computed a value,4)
(saw value,4)
(I computed a value,5)
(saw value,5)
(I computed a value,6)
(saw value,6)
(I computed a value,7)
(saw value,7)
(I computed a value,8)
(saw value,8)
(I computed a value,9)
(saw value,9)
(I computed a value,10)

However, you can fix this by changing the stream to an iterator with an initial empty iterator, even though it's far from obvious that this is the case:

def runiter(start: Int) {
  // Create an iterator that returns successive integers on demand, e.g. 3, 4, 5, ....
  val iter = {
    def loop(v: Int): Iterator[Int] = { println("I computed a value", v); Iterator(v)} ++ loop(v+1)
    Iterator[Int]() ++ loop(start)
  }
  // Now, sometime later, we retrieve the values ....
  println("about to loop")
  for (x <- iter) {
    if (x < 10) println("saw value", x) else return
  }
}

Result:

scala> runiter(3)
about to loop
(I computed a value,3)
(saw value,3)
(I computed a value,4)
(saw value,4)
(I computed a value,5)
(saw value,5)
(I computed a value,6)
(saw value,6)
(I computed a value,7)
(saw value,7)
(I computed a value,8)
(saw value,8)
(I computed a value,9)
(saw value,9)
(I computed a value,10)

Note that if you don't add the initial empty iterator, you will run into the same premature execution problem as with streams.