I've been using Mahout to k-means cluster text documents using both XML and SOLR index input.
The clustering appears to work, and similar documents are indeed being put in the same k-means cluster, which is great.
However, whenever I display the graphml output using ClusterDump (--outputFormat GRAPH_ML) I get a plot showing all the clusters, but with each element displayed around the circumference of its parent cluster, meaning each element has approximately the same radius from the centroid.
I was expecting the elements to be scattered throughout the cluster depending on their similarity to each other (as in the Mahout examples).
Has anyone seen anything similar with their Mahout k-means clusters? I have tried to get to the bottom of this myself, but any hints or suggestions would be a huge help.
With much thanks,
P Morris