7
votes

I'm having issue with using OPTICS implementation in ELKI environment. I have used the same data for DBSCAN implementation and it worked like a charm. Probably I'm missing something with parameters but I can't figure it out, everything seems to be right.

Data is a simple 300х2 matrix, consists of 3 clusters with 100 points in each.

DBSCAN result:

Clustering result of DBSCAN

MinPts = 10, Eps = 1

OPTICS result:

Clustering result of OPTICS

MinPts = 10

2
I read the paper more accurately and discovered that the output of OPTICS algorithm itself is not a classification but a range. Therefore we get information about a density cluster structure (in a form of density plot) and work with it afterwards. I recommend using OpticsXI algorithm in ELKI to classify.Ilya Khaustov

2 Answers

4
votes

You apparently already found the solution yourself, but here is the long story:

The OPTICS class in ELKI only computes the cluster order / reachability diagram.

In order to extract clusters, you have different choices, one of which (the one from the original OPTICS publication) is available in ELKI.

So in order to extract clusters in ELKI, you need to use the OPTICSXi algorithm, which will in turn use either OPTICS or the index based DeLiClu to compute the cluster order.

The reason why this is split into two parts in ELKI probably is so that you can on one hand implement another logic for extracting the clusters, and on the other hand implement different methods like DeLiClu for computing the cluster order. That would align well with the modular architecture of ELKI.

IIRC there is at least one more method (apparently not yet in ELKI) that extracts clusters by looking for local maxima, then extending them horizontally until they hit the end of the valley. And there was a different one that used "inflexion points" of the plot.

4
votes

@AnonyMousse pretty much put it right. I just can't upvote or comment yet.

We hope to have some students contribute the other cluster extraction methods as small student projects over time. They are not essential for our research, but they are good tasks for students that want to learn about ELKI to get started.

ELKI is a fast moving project, and it lives from community contributions. We would be happy to see you contribute some code to it. We know that the codebase is not easy to get started with - it is fairly large, and the generality of the implementation and the support for index structures make it a bit hard to get started. We try to add Tutorials to help you to get started. And once you are used to it, you will actually benefit from the architecture: your algorithms get the benfits of indexing and arbitrary distance functions, while if you would implement from scratch, you would likely only support Euclidean distance, and no index acceleration.

Seeing that you struggled with OPTICS, I will try to write an OPTICS tutorial in the new year. In particular, OPTICS can benefit a lot from using an appropriate index structure.