1
votes

Got a precomputed similarity matrix Sim where s_ij equals the similarity between vector i and vector j.

Trying to compute clusters. Doing

  clustering = SpectralClustering(cluster_count, affinity='precomputed', eigen_solver='arpack')
  clustering.fit(sparse_dok_sim_matrix)
  clusters = clustering.fit_predict(sparse_dok_sim_matrix)
  print clusters

I am getting something that looks like cluster labels but is completely false. The weight of the edges between samples in the same cluster is 99% of the weight of edges on the graph. The clustering result appears to be completely random and meaningless.

Any advice, maybe I'm doing it wrong?

For instance I attempted this with dbscan and got nothing:

results = block_diag(np.ones((3,3)), np.ones((3,3)), np.ones((4,4)))
results = 1000 * (np.ones((len(results), len(results))) - results)
print results
print dbscan(X=results.astype(float), metric='precomputed')

This was the result, it said everything is noise even though it's clear the first three points are in the same spot and so are the next three points... The last four as well.

[[    0.     0.     0.  1000.  1000.  1000.  1000.  1000.  1000.  1000.]
 [    0.     0.     0.  1000.  1000.  1000.  1000.  1000.  1000.  1000.]
 [    0.     0.     0.  1000.  1000.  1000.  1000.  1000.  1000.  1000.]
 [ 1000.  1000.  1000.     0.     0.     0.  1000.  1000.  1000.  1000.]
 [ 1000.  1000.  1000.     0.     0.     0.  1000.  1000.  1000.  1000.]
 [ 1000.  1000.  1000.     0.     0.     0.  1000.  1000.  1000.  1000.]
 [ 1000.  1000.  1000.  1000.  1000.  1000.     0.     0.     0.     0.]
 [ 1000.  1000.  1000.  1000.  1000.  1000.     0.     0.     0.     0.]
 [ 1000.  1000.  1000.  1000.  1000.  1000.     0.     0.     0.     0.]
 [ 1000.  1000.  1000.  1000.  1000.  1000.     0.     0.     0.     0.]]
(array([], dtype=int64), array([-1, -1, -1, -1, -1, -1, -1, -1, -1, -1]))
1

1 Answers

2
votes

For DBSCAN: according to the documentation, the min_samples=5 by default. None of your "clusters" has 5 samples, so everything is labelled as noise. For the SpectralClustering I can't help you without more details.