3
votes

The wikibook on kmeans clustering (http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/K-Means) gives an example cluster analysis :

Can the code be amended so that a label is generated from each cluster? Below graph does not indicate what is being compared. There are three clusters but what are the names of each cluster ?

enter image description here

Here is the code that generates the graph :

# import data (assume that all data in "data.txt" is stored as comma separated values)
x <- read.csv("data.txt", header=TRUE, row.names=1)

# run K-Means
km <- kmeans(x, 3, 15)

# print components of km
print(km)

# plot clusters
plot(x, col = km$cluster)
# plot centers
points(km$centers, col = 1:2, pch = 8)
1
The clusters are not labelled in the plot you show, but they are coloured by cluster (e.g. red points are from one cluster, black points are from another, etc.). What do you mean when you say, "name of a cluster"?ialm
@ialm To describe which clusters are close to each other it may be useful for each cluster to have a label, but colours are sufficient, thanksblue-sky
I also added an answer to plot the actual labels, if you really want to!ialm

1 Answers

3
votes

As I mentioned in the comments, the clusters are already "labelled" by colour, where different colours are associated with cluster membership. To plot the "cluster labels" instead, you can use:

plot(x, type='n')
text(x, labels=km$cluster, col=km$cluster)

This should plot the "cluster name" instead of the points, and also colour the labels by the clusters.