Classify K-means in Text Mining

Question

The goal is to create a computer-generated news site that aggregates headlines from different news sources around the world:

Taking a look at the centroid table results I want to Understand the following:

https://ibb.co/n1mvnbk

I used K=5 and I am using TF-IDF

Explain what those numbers mean? When an attribute is zero in multiple clusters, what does it mean?
When I sort the centroid table by each cluster at a descending order, I find some words or attributes that have a higher value with this cluster while zero values in other clusters. Does this mean that these words occur more or less frequently in this cluster? How can I discuss the clustering model Do all the clusters make sense and why?

Do you think k=5 is a good choice for this dataset? or I need to choose 3? How can I classify that?

Swarit Agarwal Swarit Agarwal · Accepted Answer · 2019-11-14T05:03:01

I believe K=5 denotes number of cluster you are looking into current Dataset. On the basis 5 centroid will be placed in data will be around them.

Do you think k=5 is a good choice for this dataset? Its hard to predict this way. It is all done by mathematical combination and permutation.

You might use Elbow Method to identify correct number of cluster needed for any given dataset. This methodology is based on WCSS(Within Cluster Sums of Squares) which find distance between points and provide centroid points.

Classify K-means in Text Mining

2 Answers