How to calculate BIC for k-means clustering in R

Question

I've been using k-means to cluster my data in R but I'd like to be able to assess the fit vs. model complexity of my clustering using Baysiean Information Criterion (BIC) and AIC. Currently the code I've been using in R is:

KClData <- kmeans(Data, centers=2, nstart= 100)

But I'd like to be able to extract the BIC and Log Likelihood. Any help would be greatly appreciated!

Roland, thanks for the tip! I'm actually trying to compare the results of k-means to Mclust outputs which is why I'd like to use the BIC from my k-means clustering to GMM that Mclust uses. — UnivStudent
I am not an expert, but think that k-means is not a maximum likelihood algorithm. Are you sure that AIC and BIC are applicable? — Roland
It does have a log Likelihood associated with it but I'm having trouble finding it and implementing it in R. — UnivStudent
See a similar question on a statistician community stats.stackexchange.com/q/55147/3277 — ttnphns

Andy Clifton Andy Clifton · Accepted Answer · 2014-08-28T20:09:09

For anyone else landing here, there's a method proposed by Sherry Towers at http://sherrytowers.com/2013/10/24/k-means-clustering/, which uses output from stats::kmeans. I quote:

The AIC can be calculated with the following function:

kmeansAIC = function(fit){

m = ncol(fit$centers)
n = length(fit$cluster)
k = nrow(fit$centers)
D = fit$tot.withinss
return(D + 2*m*k)
}

From the help for stats::AIC, you can also see that the BIC can be calculated in a similar way to the AIC. An easy way to get the BIC is to replace the return() in the above function, with this:

return(data.frame(AIC = D + 2*m*k,
                  BIC = D + log(n)*m*k))

So you would use this as follows:

fit <- kmeans(x = data,centers = 6)
kmeansAIC(fit)

How to calculate BIC for k-means clustering in R

4 Answers