K means clustering initialization

Question

In k-means clustering, how to start with the process?

should i choose k farthest points or random points and form k clusters and joining other points to clusters?

or

choosing a single point and then, checking other points against it [euclidean distance] if < THRESHOLD add or form new cluster?

mattnedrich mattnedrich · Accepted Answer · 2013-11-20T16:55:31

To seed the K-Means algorithm, it's standard to choose K random observations from your data set. Since K-Means is subject to local optima (e.g., depending on the initialization it doesn't always find the best solution), it's also standard to run it several times with different initializations and choose the result with the lowest error.

K means clustering initialization

2 Answers