0
votes

The workflow I want to implement is:

dm <- dist(data)
dend <- hclust(dm)
k <- stats::cutree(dend, k = 10)
data$clusters <- k
plot(hclust, colorBranchees = k) #???? What I can use here.

So I searched for color dendrogram branches using cutree output. All I found is dendextend.

Problem is that I am failing to implement the workflow with dendextend.

This is what I came up with, but I would now like to have clusterLabels shown

library(dendextend)
hc <- hclust(dist(USArrests))
dend <- as.dendrogram(hc)

kcl <- dendextend::cutree(dend, k = 4)
dend1 <- color_branches(dend, clusters = kcl[order.dendrogram(dend)], groupLabels = TRUE)%>% set("labels_cex", 1)
plot(dend1, main = "Dendrogram dist JK")

Group labels are not shown

Also, trying something like groupLabels = 1:4 does not help.

Specifying with the param k (number of o clusters) the groupLable does work. But unfortunately, the labels are different than those generated by dendextend own cutree method.

Note that here cluster 4 has 2 members.

> table(kcl)
kcl
 1  2  3  4 
14 14 20  2

This post suggest to use dendextend::cutree(dend,k = nrCluster, order_clusters_as_data = FALSE) r dendrogram - groupLabels not match real labels (package dendextend)

But then I can not use the output of dendextend::cutree to group the data (since the ordering does not match.

I would be happy to use a different dendrogram plotting library in R but so far my Web searches for "coloring dendrogram branches by cutree output" point to the dendextend package.

1

1 Answers

0
votes

I'm sorry but I'm not sure I fully understand your question.

It seems like you want to align between curtree's output and your original data. If that's the case, then you need to use dendextend::cutree(dend,k = nrCluster, order_clusters_as_data = TRUE) e.g.:

require(dendextend)
d1 <- USArrests[1:10,]
hc <- hclust(dist(d1))
dend <- as.dendrogram(hc)
k <- dendextend::cutree(dend, k = 3, order_clusters_as_data = TRUE)
d2 <- cbind(d1, k)
plot(color_branches(dend, 3))
d2
# an easier way to see the clusters is by ordering the rows of the data based on the order of the dendrogram
d2[order.dendrogram(dend),]

The plot is fine:

enter image description here

And the clusters are mapped correctly to the data (see outputs)

> require(dendextend)
> d1 <- USArrests[1:10,]
> hc <- hclust(dist(d1))
> dend <- as.dendrogram(hc)
> k <- dendextend::cutree(dend, k = 3, order_clusters_as_data = TRUE)
> d2 <- cbind(d1, k)
> plot(color_branches(dend, 3))
> d2
            Murder Assault UrbanPop Rape k
Alabama       13.2     236       58 21.2 1
Alaska        10.0     263       48 44.5 1
Arizona        8.1     294       80 31.0 2
Arkansas       8.8     190       50 19.5 1
California     9.0     276       91 40.6 2
Colorado       7.9     204       78 38.7 1
Connecticut    3.3     110       77 11.1 3
Delaware       5.9     238       72 15.8 1
Florida       15.4     335       80 31.9 2
Georgia       17.4     211       60 25.8 1
> # an easier way to see the clusters is by ordering the rows of the data based on the order of the dendrogram
> d2[order.dendrogram(dend),]
            Murder Assault UrbanPop Rape k
Connecticut    3.3     110       77 11.1 3
Florida       15.4     335       80 31.9 2
Arizona        8.1     294       80 31.0 2
California     9.0     276       91 40.6 2
Arkansas       8.8     190       50 19.5 1
Colorado       7.9     204       78 38.7 1
Georgia       17.4     211       60 25.8 1
Alaska        10.0     263       48 44.5 1
Alabama       13.2     236       58 21.2 1
Delaware       5.9     238       72 15.8 1

Please LMK if this answers your question or if you have followup questions here.