I have set of data (of 5000 points with 4 dimensions) that I have clustered using kmeans in R.
I want to order the points in each cluster by their distance to the center of that cluster.
Very simply, the data looks like this (I am using a subset to test out various approaches):
id Ans Acc Que Kudos
1 100 100 100 100
2 85 83 80 75
3 69 65 30 29
4 41 45 30 22
5 10 12 18 16
6 10 13 10 9
7 10 16 16 19
8 65 68 100 100
9 36 30 35 29
10 36 30 26 22
Firstly, I used the following method to cluster the dataset into 2 clusters:
(result <- kmeans(data, 2))
This returns a kmeans object that has the following methods: cluster, centers etc.
But I cannot figure out how to compare each point and produce an ordered list.
Secondly, I tried the seriation approach as suggested by another SO user here
I use these commands:
clus <- kmeans(scale(x, scale = FALSE), centers = 3, iter.max = 50, nstart = 10)
mns <- sapply(split(x, clus$cluster), function(x) mean(unlist(x)))
result <- dat[order(order(mns)[clus$cluster]), ]
Which seems to produce an ordered list but if I bind it to the labeled clusters (using the following cbind command):
result <- cbind(x[order(order(mns)[clus$cluster]), ],clus$cluster)
I get the following result, which does not appear to be ordered correctly:
id Ans Acc Que Kudos clus
1 3 69 65 30 29 1
2 4 41 45 30 22 1
3 5 10 12 18 16 2
4 6 10 13 10 9 2
5 7 10 16 16 19 2
6 9 36 30 35 29 2
7 10 36 30 26 22 2
8 1 100 100 100 100 1
9 2 85 83 80 75 2
10 8 65 68 100 100 2
I don't want to be writing commands willy-nilly but understand how the approach works. If anyone could help out or spread some light on this, it would be really great.
EDIT:::::::::::
As the clusters can be easily plotted, I'd imagine there is a more straightforward way to get and rank the distances between points and the center.
The centers for the above clusters (when using k = 2) are as follows. But I do not know how to get and compare this with each individual point.
Ans Accep Que Kudos
1 83.33333 83.66667 93.33333 91.66667
2 30.28571 30.14286 23.57143 20.85714
NB::::::::
I don't need top use kmeans but I want to specify the number of clusters and retrieve an ordered list of points from those clusters.
kmeans()
also returns the final cluster centers. From there, it shouldn't be too hard to compute the distance from each point to the center of its cluster. – user554546kmeans()
). You have the cluster assignments and centers of each cluster. What, exactly, is confusing you about computing distances between each point and the center of that point's cluster? – user554546