K-means clustering in sklearn, number of clusters is known in advance (it is 2). There are multiple features. Feature values are initially without any weight assigned, i.e. they are treated equally weighted. However, task is to assign custom weights to each feature, in order to get best possible clusters separation. How to determine optimum sample weights (sample_weight) for each feature, in order to get best possible separation of the two clusters? If this is not possible for k-means, or for sklearn, I am interested in any alternative clustering solution, the point is that I need method of automatic determination of appropriate weights for multivariate features, in order to maximize clusters separation.
0
votes
In meantime, I have implemented following: clustering by each component separately, then calculating silhouette score, calinski harabaszscore, dunn score and inverse davies bouldin score. Then scaling those score to same magnitude, then PCA them to 1 feature. This produced weights for each component. It seems to give reasonable results. I suppose better approach would be full factorial experiment (DOE), but it seems that this simple approach produces satisfactory results as well.
– zlatko
1 Answers
0
votes
In meantime, I have implemented following: clustering by each component separately, then calculating silhouette score, calinski harabasz score, dunn score and inverse davies bouldin score for each component (feature) separately. Then scaling those scores to same magnitude, then PCA them to 1 feature. This produced weights for each component. It seems this approach produces reasonable results. I suppose better approach would be full factorial experiment (DOE), but it seems that this simple approach produces satisfactory results as well.