0
votes

I have a similarity matrix that I have calculated between a large number of objects, and each object can have a non-zero similarity with any other object. I generated this matrix for another task, and would now like to cluster it for a new analysis.

It seems like scikit's spectral clustering method could be a good fit, because I can pass in a precomputed affinity matrix. I also know that spectral clustering typically uses some number of nearest neighbors when building the affinity matrix, and my similarity matrix does not have that same constraint.

If I pass in a matrix that allows any number of edges between nodes in the affinity matrix, will scikit limit each node to having only a certain number of nearest neighbors? If not, I guess I will have to make that change to my pre-computed affinity matrix.

2

2 Answers

1
votes

You don't have to compute the affinity yourself to do some spectral clustering, sklearn does that for you.

When you call sc = SpectralClustering(),, the affinity parameter allows you to chose the kernel used to compute the affinity matrix. rbf seems to be the kernel by default and doesn't use a particular number of nearest neighbours. However, if you decide to chose another kernel, you might want to specify that number with the n_neighboursparameter.

You can then use sc.fit_predict(your_matrix) to compute the clusters.

1
votes

Spectral clustering does not require a sparsified matrix.

But if I'm not mistaken it's faster to find the dmallest non-zero Eigenvectors of a sparse matrix rather than of a dense matrix. Worst case may remain O(n^3) though - spectral clustering is one of the slowest methods you can find.