How to transform Array[(Double, Double)] into Array[Double] in Scala?

Question

I'm using MLlib of Spark (v1.1.0) and Scala to do k-means clustering applied to a file with points (longitude and latitude). My file contains 4 fields separated by comma (the last two are the longitude and latitude).

Here, it's an example of k-means clustering using Spark: https://spark.apache.org/docs/1.1.0/mllib-clustering.html

What I want to do is to read the last two fields of my files that are in a specific directory in HDFS, transform them into an RDD<Vector> o use this method in KMeans class: train(RDD<Vector> data, int k, int maxIterations)

This is my code:

val data = sc.textFile("/user/test/location/*") val parsedData = data.map(s => Vectors.dense(s.split(',').map(fields => (fields(2).toDouble,fields(3).toDouble))))

But when I run it in spark-shell I get the following error:

error: overloaded method value dense with alternatives: (values: Array[Double])org.apache.spark.mllib.linalg.Vector (firstValue: Double,otherValues: Double*)org.apache.spark.mllib.linalg.Vector cannot be applied to (Array[(Double, Double)])

So, I don't know how to transform my Array[(Double, Double)] into Array[Double]. Maybe there is another way to read the two fields and convert them into RDD<Vector>, any suggestion?

maasg maasg · Accepted Answer · 2014-12-06T12:34:47

There're two 'factory' methods for dense Vectors:

def dense(values: Array[Double]): Vector
def dense(firstValue: Double, otherValues: Double*): Vector

While the provided type above is Array[Tuple2[Double,Double]] and hence does not type-match:
(Extracting the logic above:)

val parseLineToTuple: String => Array[(Double,Double)] = s => s=> s.split(',').map(fields => (fields(2).toDouble,fields(3).toDouble))

What is needed here is to create a new Array out of the input String, like this: (again focusing only on the specific parsing logic)

val parseLineToArray: String => Array[Double] = s=> s.split(",").flatMap(fields => Array(fields(2).toDouble,fields(3).toDouble)))

Integrating that in the original code should solve the issue:

val data = sc.textFile("/user/test/location/*")
val vectors = data.map(s => Vectors.dense(parseLineToArray(s))

(You can of course inline that code, I separated it here to focus on the issue at hand)

How to transform Array[(Double, Double)] into Array[Double] in Scala?

3 Answers