How is to implement K-Means clustering algorithm with Cosine Distance measure?

Question

I need to run K-means clustering algorithm to cluster textual data but by using cosine distance measure instead of Euclidean distance. Any reliable implementation of this in python?

Edit:

I have tried to use NLTK as following:

NUM_CLUSTERS=3
kclusterer = KMeansClusterer(NUM_CLUSTERS, distance= 
            nltk.cluster.util.cosine_distance, repeats=25)

clstr = kclusterer.cluster(X, clusters=False, trace=False)
print (clstr)

But it gives me error:

TypeError: sparse matrix length is ambiguous; use getnnz() or shape[0]

X here is a TF-IDF matrix of shape (15, 155).

joostblack joostblack · Accepted Answer · 2020-12-04T15:40:52

You can use NLTK for this. The K-means from NLTK allows you to specify which measure of distance you want to use.

nltk

How is to implement K-Means clustering algorithm with Cosine Distance measure?

2 Answers