0
votes

I'm trying to implement k means clustering.

I've a set of points with coordinates (x,y) and i am using Euclidean distance for finding distance. I've computed distance between all points in a matrix

dist[i][j] - distance between points i and j

when i choose a[1][3] farthest from pt 1 as 3.

then when i search farthest from 3 i may get a[3][j] but a[1][j] may be minimum.

[pt j is far from pt3 but near to 1]

so how to choose k farthest points using the distance matrix.

1
Why don't you sort the pairs of [i][j] in ascending order? Last time I studied data mining I believe that was the solution... sort and then find the k-largest or k-smallest values. - Andon M. Coleman

1 Answers

0
votes

Note that the k-farthest points do not necessarily yield the best result: they clearly aren't the best cluster center estimates.

Plus, since k-means heuristics may get stuck in a local minimum, you will want a randomized algorithm that allows you to restart the process multiple times and get potentiall different results.

You may want to look at k-means++ which is a known good heuristic for k-means initialization.