I have to make clusters in categorical data. I am using following k-modes code to make cluster, and check optimum number of clusters using elbow method:
set.seed(100000)
cluster.results <-kmodes(data_cluster, 5 ,iter.max = 100, weighted = FALSE )
print(cluster.results)
k.max <- 20
wss <- sapply(1:k.max,
function(k){set.seed(100000)
sum(kmodes(data_cluster, k, iter.max = 100 ,weighted = FALSE)$withindiff)})
wss
plot(1:k.max, wss,
type="b", pch = 19, frame = FALSE,
xlab="Number of clusters K",
ylab="Total within-clusters sum of squares")
My Questions are:
- Is there any other method in Kmodes for checking Optimum number of clusters?
- Each seed is giving a different size of nodes, hence I am trying different seeds, and setting the seed with least total within-sum of squares, is this approach correct?
- How to check if my clusters are stable?
- I want to apply/predict this cluster in new data (of another year). How to do that?
- Is there any other method of clustering categorical data?
klaR
for instance?) and a minimal amount of data that we can use to reproduce your problem. For instance, you can paste the output ofdput(data_cluster)
. – hpesoj626