1
votes

I am trying to fit OPTICS clustering model to my data using python's sklearn

from sklearn.cluster import OPTICS, cluster_optics_dbscan
from sklearn.preprocessing import StandardScaler

x = StandardScaler().fit_transform(data.loc[:, features])

op = OPTICS(max_eps=20, min_samples=10, xi=0.1)
op = op.fit(x)

From this fitted model, I get the reachability distances (op.reachability_) and the ordering (op.ordering_) of the points and also the cluster labels (op.labels_)

Now, I want to check how the clusters would vary by changing the parameter xi (0.01 in this case). Can I do this without fitting the model again and again with different xi's (which takes a lot of time)?

Or, in other words, is there a scikit-learn function that takes the reachability distances (op.reachability_), the ordering (op.ordering_) of the points and xi as input and outputs the cluster labels?

I found a function cluster_optics_dbscan which "performs DBSCAN extraction for an arbitrary epsilon given reachability-distances, core-distances and ordering and epsilon" (Not quite what I want)

1

1 Answers

1
votes

A priori, you need to call the fit method, which is doing the actual cluster computation, as stated in the function description.

However, if you look at the optics class, the cluster_optics_xi function "automatically extract clusters according to the Xi-steep method", calling both the _xi_cluster and _extract_xi_labels functions, which both take the xi parameter as input. So, by using them and refactoring a bit, you may be able to achieve what you want.