Scikit clustering always give one point cluster

Question

I'm using Scikit clustering kmeans on my dataset. I'm using 6 clusters and everything seems fine:

enter image description here

However, immediately after the fitting the kmeans, I do a group by on the labels and get the following:

Length: 55003, dtype: int64
0  count    23110
1  count        1
2  count    10923
3  count    17949
4  count     1736
5  count     1284

I always get that cluster that only has 1 data point. If I save the model and predict the data again on the model, the predictions also have one data point on it's own. What's up with that? Is it a bug with sci-kit?

This is odd, are you able to post your data? also what version numpy, sklearn are you using? — EdChum

Tjorriemorrie Tjorriemorrie · Accepted Answer · 2015-01-13T07:02:48

0

votes

It was one outlier. I removed it from my data and the groups are appropriate now.

Scikit clustering always give one point cluster

1 Answers