I am trying to implement Kmeans Clustering in Spark Scala.
So currently I have an RDD which looks like this- It contains the cluster centers alond with the data point.
scala> res2.collect
res54: Array[(Int, Array[Any])] = Array((2,Array(19, 15, 39)), (2,Array(21, 15, 81)), (2,Array(20, 16, 6)), (1,Array(23, 16, 77)), (2,Array(31, 17, 40)), (3,Array(22, 17, 76)), (1,Array(35, 18, 6)), (3,Array(23, 18, 94)), (1,Array(64, 19, 3)), (1,Array(30, 19, 72)))
My next step is to sum up the arrays elementwise based on their keys and divide the result by the count (to find the new set of centroids by averaging).
I am not able to figure how to achieve this since simply using reduceByKey(__+_) won't work for arrays.