0
votes

I just started using R. I need to plot within cluster variance provided by K-means clustering on a data for 2 through 20 clusters.

Here is my code:

w <- numeric(20)
for (k in 2:20) {
kf <- kmeans(whs2018annexBdatscl,k,nstart=100)
w[k] <- kf$tot.withinss
}
plot(2:20,w,type = "b", lwd= 2, pch= 19, xlab="K", ylab = expression(SS[within]))

I got the error code below: Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ

When I plot from 1:20, it worked, but I'm supposed to plot 2:20. Please what am I doing wrong.

1
It should be 1:20 not 2:20 in your plot. There are 20 elements in w and so your x axis should have 20 elements, since total sum of squares and within sum of squares are same for the first step. You are starting in loop from 2, but w does contain the 1 iteration sum of squares value. - PKumar
FYI, you have a habit recently of asking questions without the ability for us to reproduce them. We don't have your data, have no idea how it is structured, and therefore have to speculate. That is contributing to your long run of unanswered questions. I strongly suggest you adapt how you ask questions to make them self-contained and reproducible; see stackoverflow.com/q/5963269, minimal reproducible example, and stackoverflow.com/tags/r/info. Thanks, and good luck. - r2evans

1 Answers

0
votes

It appears that you never assign to w[1], so just do

plot(2:20, w[-1],
     type = "b", lwd= 2, pch= 19, xlab="K", ylab = expression(SS[within]))

The rationale for the error is straight-forward: if plot(1:2, 3:4) plots two points, what should plot(c(1,2,3), c(4,5)) plot? The vectors need to be the same length, and this is one area in R where it does not "recycle" its arguments (for better or worse).