A clustering algorithm that accepts an arbitrary distance function

Question

I have about 200 points in Cartesian plane (2D). I want to cluster these points to k clusters with respect to arbitrary distance function (not matrix) and get the so-called centroid or representatives of these clusters. I know kmeans does this with respect to some special distance functions such as Euclidean, Manhattan, Cosine, etc. But, kmeans cannot handle arbitrary distance function because for example in centroid-updating phase of kmeans with respect to Euclidean distance function, mean of the points in each cluster is the LSE and minimizes the sum of distances of the nodes in the cluster to its centroid (mean); however, mean of the points may not minimize the ditances when the distance function is something arbitrary. Could you please help me about it and tell me if you know about any clustering algorithms that can work for me?

First, note that in the literature, "distance" means: (1) d(x,y)=d(y,x), (2) d(x,y) <= d(x,z) + d(z,y), (3) d(x,y) = 0 if and only if x=y. Is this correct in your case as well? — amit
Thanks for your consideration. No, actually 1 and 3 hold in our case, but not 2. There might be x,y, and z for which d(x,y) > d(x,z) + d(z,y). — user3314148

David Eisenstat David Eisenstat · Accepted Answer · 2014-02-15T20:42:00

If you replace "mean" with "most central point in cluster", then you get the k-medoids algorithm. Wikipedia claims that a metric is required, but I believe that to be incorrect, since I can't see where the majorization-minimization proof needs the triangle inequality or even symmetry.

A clustering algorithm that accepts an arbitrary distance function

3 Answers