I try to run and understand the results of SimpleKMeans algorithm in weka.
This is my training data
@relation weather_clustered
@attribute Instance_number numeric
@attribute outlook {sunny,overcast,rainy}
@attribute temperature numeric
@attribute humidity numeric
@attribute windy {TRUE,FALSE}
@attribute play {yes,no}
@attribute cluster {cluster0,cluster1,cluster2,cluster3,cluster4,cluster5}
@data
0,sunny,85,85,FALSE,no,cluster3
1,sunny,80,90,TRUE,no,cluster5
2,overcast,83,86,FALSE,yes,cluster2
4,rainy,68,80,FALSE,yes,cluster4
Then I run SimpleKMeans with numClusters=2 seed=10. I do want to see clustering results regarding attribute cluster, in other words I want to see which cluster attribute clusterx relates to. As you see I don't assume that attribute cluster is the right clustering.
In order to see the correspondence in the output, I set Classes to cluster evaluation = (Nom) cluster
and get the following results
Class attribute: cluster Classes to Clusters:
0 1 <-- assigned to cluster
0 0 | cluster0
0 0 | cluster1
1 0 | cluster2
0 1 | cluster3
1 0 | cluster4
0 1 | cluster5
Cluster 0 <-- cluster2
Cluster 1 <-- cluster3
Incorrectly clustered instances : 2.0 50 %
I do like the list with correspondence, this exactly what I need, however I don't understand what's the following means
Cluster 0 <-- cluster2
Cluster 1 <-- cluster3
In addition, I am confused by the following result
Incorrectly clustered instances : 2.0 50 %
Where it comes from, how weka knows the correct result, I don't have a correct result, maybe it confuses the attribute cluster with correct cluster. In short I don't understand the output.