I need to run K-means clustering algorithm to cluster textual data but by using cosine distance measure instead of Euclidean distance. Any reliable implementation of this in python?
Edit:
I have tried to use NLTK as following:
NUM_CLUSTERS=3
kclusterer = KMeansClusterer(NUM_CLUSTERS, distance=
nltk.cluster.util.cosine_distance, repeats=25)
clstr = kclusterer.cluster(X, clusters=False, trace=False)
print (clstr)
But it gives me error:
TypeError: sparse matrix length is ambiguous; use getnnz() or shape[0]
X here is a TF-IDF matrix of shape (15, 155).