In this case: a picture is worth a thousand words ..
Hello Purple Cluster!
How did this come about? First - to describe the data and the settings for DBSCAN:
- 100 data points
- Using a custom distance metric to generate a precomputed distance matrix
- The custom distance metric is simple: Only consider the
r0(x-distance)
Everything besides that rogue pair of purple points looks precisely as desired. How did that purple cluster jump clear over the Yellow Wall and claim that rogue pair of points on the top left?
Update It has been verified that there are exactly three clusters. I.e. this is not a bug in choosing three colors to denote four clusters. The verification was directly from the dbscan predicted outputs (not some hypothesis of mine):
Here that is:
NumClusters is 3 counts are (array([-1, 0, 1]), array([ 8, 67, 25]))
Another update To clarify: the 2 purple points are being added to the far right cluster (also in purple). They are not a fourth cluster. So the question is - why are those points being added to the furthest away cluster instead of the nearby green and yellow ones?

