0
votes

I am trying to determine optimal number of clusters.

# Determine optimal number of clusters
wss<-rep(0,2)
wss[1]<-sum(scale(price[,2:2],scale=FALSE)^2)
for(i in 2:16)
wss[i]<-sum(kmeans(price[,2:2],centers=i)$withinss)
plot(4:2,wss,type="b",xlab="Number of clusters",ylab="Within-cluster sum of squares")

Every line works except the last one. The last one gives an error:

Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ

I have tried some solutions from other questions but no luck. Any advice? Thanks a tons!!

SAMPLE DATA:

    Country        Price

    Albania        1.57
    Andorra        1.24
    Azerbaijan     0.47
    Austria        1.33
    Belarus        0.73
    Belgium        1.54
    Bosnia & Herz. 1.29
    Bulgaria       1.13
    Croatia        1.44
    Czech Rep.     1.32
    Cyprus         1.28
    Denmark        1.74
    Estonia        1.41
    Finland        1.61
    France         1.67
    Georgia        0.9
1
Welcome to SO. You need to share sample data with dput and the expected output - Sonny
Afghanistan 0,67 Albania 1,57 Algeria 0,35 Andorra 1,24 Angola 0,52 This is sample data, there are actually 47 rows. I would expect it to have a "curve" as an output to see what is the optimal clustering count. - Denis
i.stack.imgur.com/VVzNR.png - expected output sample - Denis
use dput(yourData) or dput(head(yourData)) to share data - Iman
Sample data should now be visible. Thanks. - Denis

1 Answers

0
votes

Length of wss (y variable) is 16 but in x-axis you are using 4:2 (length is 3). That's why you are getting the Error.

change 4:2 to 17:2 to make length of x and y variables same. Like:

plot(17:2,wss,type="b",xlab="Number of clusters",ylab="Within-cluster sum of squares")