0
votes

I have a dataset containing 1599 observations and 10 attributes on which iIneed to do kmeans clustering. I have done the kmeans with 6 clusters and I can see the cluster centers, size, etc. and which observation lies in which cluster. Now, I need to plot these results such that I have in a single plot the following information: On x-axis, I want 1 of the 10 attributes of my original data, on y-axis I want another attribute and in the plot, I want all 1599 observations, but I want them in 6 different colors for each cluster they belong. So, I will have 10C2 = 45 plots. Basically, this should give me the information that cluster 1 is high/medium/low in terms of a particular attribute while cluster 2 is so and so.....for all 6 clusters.

I tried the function plotcluster from fpc package but from what I understood, it maps the data into 2D, using PCA, and then plots the clusters in terms of 2 dimensions which are different from the original attributes. So now when I will say cluster 1 is low, in dim1, it wouldn't really make much sense.

Is there a function to do what I want, or should I just append the '$cluster' information from the kmeans output with my original data and try to plot taking 2 columns from my data at a time using the basic function plot()?

1
Hi, if any answer solves your problem can you click on "accept it" so that other people can see it? thanksagenis

1 Answers

0
votes

I suggest one solution, probably not the simplest one (with a for loop) but it seems to answer what you need:

df=mtcars
df$cluster = factor( kmeans(df, centers=6)$clust )
mycomb <- combn(1:ncol(df), 2)
for (xy in 1:45 ) {
  plot(x=df[, mycomb[1,xy]], 
       y=df[, mycomb[2,xy]], 
       col=as.numeric(df$clust), 
       xlab=names(df)[mycomb[1,xy]],
       ylab=names(df)[mycomb[2,xy]])
}