1
votes

I am new to scala and I am practicing it with k-means algorithm following the tutorial from k-means

I am confused by this part of this tutorial:

var newCentroids = pointsGroup.mapValues(ps => average(ps)).collectAsMap()  

This causes a type mismatch error because function average needs a Seq, while we give it an Iterable. How can I fix this? What caused this error?

2
I assume tutorial was written at the time of spark 0.9.0, when groupByKey was returning RDD[(K, Seq[V])] while now it's giving us RDD[(K, Iterable[V])]Odomontois

2 Answers

4
votes

Well Seq is a sub-type of Iterable but not vice-versa, so it is not possible to convert these types in the type systems.

There is an explicit conversion available by writing average(ps.toSeq). This conversion will iterate the Iterable and collect the items into a Seq.

2
votes

We could easily replace Seq with Iterable in provided solution for average function:

def average(ps: Iterable[Vector]) : Vector = {
  val numVectors = ps.size
  var out = new Vector(ps.head.elements)
  ps foreach ( out += _)
  out / numVectors
}

Or even in constant space:

def average(ps: Iterable[Vector]): Vector = {
  val numVectors = ps.size

  val vSize = ps.head.elements.length

  def element(index: Int): Double = ps.map(_(index)).sum / numVectors

  new Vector(0 until vSize map element toArray)
}