2
votes

It's fairly easy to create doc vectors and cluster using apache mahout . Executing a clusterdump allows the user to view the terms associated with the individual clusters. However, how can I identify the documents that belong to each cluster ?

Thanks

1

1 Answers

0
votes

I would think, for each document, find the Euclidean distance of it's vector with each cluster center, and assign it to closest cluster.