0
votes

I'm using Scikit clustering kmeans on my dataset. I'm using 6 clusters and everything seems fine:

enter image description here

However, immediately after the fitting the kmeans, I do a group by on the labels and get the following:

Length: 55003, dtype: int64
0  count    23110
1  count        1
2  count    10923
3  count    17949
4  count     1736
5  count     1284

I always get that cluster that only has 1 data point. If I save the model and predict the data again on the model, the predictions also have one data point on it's own. What's up with that? Is it a bug with sci-kit?

1
This is odd, are you able to post your data? also what version numpy, sklearn are you using? - EdChum

1 Answers

0
votes

It was one outlier. I removed it from my data and the groups are appropriate now.