0
votes

I'm working on a text mining/clustering project and am trying to create a table which contains number of clusters as rows and 6 columns representing the following 6 metrics: max.diameter, min.separation, average.within,average.between,avg.silwidth,dunn.

I need to create the tables for 3 methods - kmeans, pam and hclust.

I was able to create something for kmeans

dtm0.90Dist = dist(dtm0.90)

foreachcluster = function(k) { 
  kmeans.result = kmeans(dtm0.90, k);
  kmeans.stats = cluster.stats(dtm0.90Dist,kmeans.result$cluster); 
                              c(kmeans.stats$min.separation, kmeans.stats$max.diameter,
                               kmeans.stats$average.within, kmeans.stats$avearge.between,
                               kmeans.stats$avg.silwidth, kmeans.stats$dunn) 
}
rbind(foreachcluster(2), foreachcluster(3), foreachcluster(4), foreachcluster(5),
      foreachcluster(6), foreachcluster(7),foreachcluster(8))

and I get the following output

       [,1]     [,2]     [,3]      [,4]       [,5]
[1,] 3.162278 30.19934 5.831550 0.5403872 0.10471348
[2,] 2.236068 28.37252 5.006058 0.3923446 0.07881104
[3,] 1.000000 28.37252 4.995478 0.2496066 0.03524537
[4,] 1.000000 26.40076 4.387212 0.2633338 0.03787770
[5,] 1.000000 26.40076 4.353248 0.2681947 0.03787770
[6,] 1.000000 26.40076 4.163757 0.1633954 0.03787770
[7,] 1.000000 26.40076 4.128927 0.2676423 0.03787770

I need similar output for hclust and pam methods but for the life of me can't get the same function to work for either of the two methods

OK, so I was able to make the function for HCLUST

forhclust=function(k){dfDist = dist(dtm0.90);
                      hclust.result = hclust(dfDist);
                      hclust.cluster = (cutree(hclust.result, k));
                      cluster.stats(dfDist,hclust.cluster);c(cluster.stats$min.separation)}

But I get an error when i run this

Error in cluster.stats$min.separation : 
  object of type 'closure' is not subsettable

What I need is for it to print "min.separation" output.

I would really appreciate all the help and perhaps some guidance in understanding why my approach is failing in hclust.

Also, is there a good source that can explain the functioning and application of these methods, step by step, in detail?

Thank You

1

1 Answers

0
votes
foreachcluster2 = function(k) { 
  hc = hclust(mDist, method = "ave")
  hresult = cutree(hc, k)
  h.stats = cluster.stats(mDist,hresult); 
  c( max.dia=h.stats$max.diameter, 
     min.sep=h.stats$min.separation,
     avg.wi=h.stats$average.within, 
     avg.bw=h.stats$average.between,
     silwidth=h.stats$avg.silwidth, 
     dunn=h.stats$dunn) 
}
t2 = rbind(foreachcluster2(2), foreachcluster2(3), foreachcluster2(4),     foreachcluster2(5),foreachcluster2(6),
       foreachcluster2(7), foreachcluster2(8), foreachcluster2(9), foreachcluster2(10), 
       foreachcluster2(11), foreachcluster2(12),foreachcluster2(13),foreachcluster2(14))
rownames(t2) = 2:14
t2

This should work. For pam():

pamC <- pam(x=m, k=2)
pamC
pamC$clustering

use $clustering instead of $cluster, the rest are the same.