I have a one-dimensional data set for which the histogram plot shows multiple local maxima, so I know that there are multiple regions in my one-dimensional space where the data is more dense. I want to determing boundaries for these dense regions that allow me to classify the dense region / cluster that a certain data point is in. For this I am using OPTICS, because it should be able to better deal with the different densities between the clusters compared to DBSCAN.
I am using ELKI (version 0.6.0) in Java code (I know it is disrecommended by the ELKI team to embed ELKI in Java, but I need to repeat my workflow for many datasets and therefore its better to automate this in my case). Code snippet below prints indices of the start and end items of the clusters. The ELKI documentation on OPTICSModel does not clearly define what these index numbers correspond to, but I assume these are the indices of the start and end data items in the augmented cluster-ordering of the database (like the ClusterOrderResult object that OPTICS.run()-created), as opposed to indices of the start and end data items of the database itself (unordered).
ListParameterization opticsParams = new ListParameterization();
opticsParams.addParameter(OPTICSXi.XI_ID, 0.01);
opticsParams.addParameter(OPTICS.MINPTS_ID, 100);
OPTICSXi<DoubleDistance> optics = ClassGenericsUtil.parameterizeOrAbort(OPTICSXi.class, opticsParams);
ArrayAdapterDatabaseConnection arrayAdapterDatabaseConnection = new ArrayAdapterDatabaseConnection(myListOfOneDimensionalFeatureVectors.toArray(new double[myListOfOneDimensionalFeatureVectors.size()][2]));
ListParameterization dbParams = new ListParameterization();
dbParams.addParameter(AbstractDatabase.Parameterizer.INDEX_ID, RStarTreeFactory.class);
dbParams.addParameter(RStarTreeFactory.Parameterizer.BULK_SPLIT_ID, SortTileRecursiveBulkSplit.class);
dbParams.addParameter(AbstractDatabase.Parameterizer.DATABASE_CONNECTION_ID, arrayAdapterDatabaseConnection);
Database db = ClassGenericsUtil.parameterizeOrAbort(StaticArrayDatabase.class, dbParams);
db.initialize();
result = optics.run(db);
List<Cluster<OPTICSModel>> clusters = result.getAllClusters();
for(Cluster<OPTICSModel> cluster : clusters){
if(!cluster.isNoise())
System.out.println(cluster.getModel().getStartIndex() + ", "+ cluster.getModel().getEndIndex() +"; ");
}
Now I want to know where in my one-dimensional space my clusters start and end. Therefore I would like to retrieve the data items corresponding to the start and end indices that my code above already obtains. I assume that I would need a ClusterOrderResult-object for that from which I could then retrieve the obtained indices. In the documentation however it seems like it is not possible to retrieve such a thing from the Clustering result object that I obtained by calling optics.run(). As there seemed to be no way of obtaining this ordered databased, I naively tried obtaining the indices from my original input dataset instead by replacing the println in the code above with the println below:
System.out.println(myListOfOneDimensionalFeatureVectors.get(cluster.getModel().getStartIndex())[0] + ", "+ myListOfOneDimensionalFeatureVectors.get(cluster.getModel().getEndIndex())[0] +"; ";
As I allready expected however, the indices do not seem to belong to the original input file, as this regularly prints end boundaries with lower values in my one dimensional space than the end boundaries. Does anybode know any way to obtain the original 1-dimensional data values that correspond to the start and end indices found with OPTICS clustering? I want to use these values later in my code.