Predict clusters from data using Spark MLlib KMeans

Question

I have generated my cluster centers from features of my data say 'Kmeans.data.txt' as you find in

https://github.com/apache/spark/blob/master/data/mllib/kmeans_data.txt

This was performed using KMeans in Spark MLib.

clusters.clusterCenters.foreach(println)

Any idea how to predict the clusters derived from this data?

Taiwo O. Adetiloye Taiwo O. Adetiloye · Accepted Answer · 2016-03-25T17:38:57

Excerpt from the KMean MLlib clustering code snippet retrieved from Scala Spark

import org.apache.spark.mllib.clustering.KMeans
import org.apache.spark.mllib.linalg.Vectors

// Load and parse the data
val data = sc.textFile("data/mllib/kmeans_data.txt")
val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble)))

// Cluster the data into two classes using KMeans
val numClusters = 2
val numIterations = 20
val clusters = KMeans.train(parsedData, numClusters, numIterations)

// here is what I added to predict data points that are within the clusters
clusters.predict(parsedData).foreach(println)

Predict clusters from data using Spark MLlib KMeans

2 Answers