Dead simple clustering algorithm

Question

What is a good clustering algorithm that simply puts two data items into the same cluster if their separation is less than some user specified cutoff?

i.e. the result of X_clustering(data, distance, epsilon) is a set of cluster assignments such that for any pair i,j, they are in the same cluster if distance(data[i], data[j]) < epsilon. If distance(data[i], data[j]) >= epsilon they can be in different clusters (if there aren't other data that end up linking them...).

Another way of stating it is: i,j are in the same cluster if there exists a path [i, x, y, z..., j] through the data such that each step is of distance<epsilon, and they are in different clusters if no such path exists.

OmG OmG · Accepted Answer · 2020-04-17T14:10:08

Your idea is not working. If for all pairs of (data[i], data[j]) in a cluster, their distance is less than a given epsilon, it means all members of that cluster are located in a circle with the radius of epsilon. Hence, this clustering method cannot be generalized.

By the way, DBSCAN is a good clustering algorithm in case of determining clusters base on their density with a given epsilon. You can modify this algorithm by adding a stronger constraint such that:

each data can be added to a cluster if its distance to all members of the cluster is less than a given epsilon.

Dead simple clustering algorithm

1 Answers