2
votes

To get distortion function (sum of distance for each point to its center) when doing K means clustering by Scikit-Learn, one simple way is just to get the centers (k_means.cluster_centers_) and sum up the distance for each point.

Just wondering if there is a faster way? (In terms of programmer time) Something like a direct function call or so.

1
I'm guessing there is, since getting the cluster centers implies summing the distances anyway.Joel Cornett

1 Answers

7
votes

This is already pre-computed at fit time in the inertia_ attribute for the KMeans class.

>>> from sklearn.datasets import load_iris
>>> from sklearn.cluster import KMeans
>>> iris = load_iris()
>>> km = KMeans(3).fit(iris.data)
>>> km.inertia_
78.940841426146108