What is a good clustering algorithm that simply puts two data items into the same cluster if their separation is less than some user specified cutoff?
i.e. the result of X_clustering(data, distance, epsilon)
is a set of cluster assignments such that for any pair i,j
, they are in the same cluster if distance(data[i], data[j]) < epsilon
. If distance(data[i], data[j]) >= epsilon
they can be in different clusters (if there aren't other data that end up linking them...).
Another way of stating it is: i,j
are in the same cluster if there exists a path [i, x, y, z..., j]
through the data such that each step is of distance<epsilon
, and they are in different clusters if no such path exists.