Please forgive the lack of formal terms, I've only recently approached ML.
For learning purposes, I decided to try a Ruby implementation of the DBSCAN algorithm (https://github.com/matiasinsaurralde/dbscan).
Building on the simple example at https://github.com/matiasinsaurralde/dbscan/blob/master/examples/simple.rb I created an array of 1000 arrays, each holding two random values, "x" and "y" (a 2d point) and I've then fed this data to the DBSCAN algorithm (tweaking "epsilon" and "minimum distance" as needed).
data_sample = Array.new(1000) { Array.new(2) { rand(100).round } }
dbscan = DBSCAN( data_sample, :epsilon => 3, :min_points => 2, :distance => :euclidean_distance )
and then I've then exported the resulting data (clusters and unclustered data) to Excel, to draw a graph of clusters and unclustered data.
This is what I came up with:
Black dots is unclustered data.
Now there is one thing I am unsure about: for some points very close to each other, or points who share the same exact x and y, I am seeing that one of the two points, instead of making it into the sane cluster of the other point, is categorized as unclustered.
Take a look at point 47, 74: the point belonging to a cluster is "above" another unclustered point. This also happens at 14, 87, at 77,64, at 20,61 and in many other places (for some couple of points, they have the same x and y).
Now as I said I'm still approaching to this, so could anyone please tell me if there is an explanation for what I am seeing? Does it have to do with the inner workings of the DBSCAN algorithm? Or is it more likely that there are some errors in the implementation of the algorithm? Or some wrong assumption I am making?
I hope this is all you need to know but if you need more please just ask.