I want to set the color of the branches of my dendrogram, given manually-assigned groups of my leaves. So I know in advance I want to color e.g. leaves A-C in red and all branches which only lead to red leaves shall be colored red as well.
I can color branches of my dendrogram using the "dendextend" package.
However, I have no control about which color gets assigned to which cluster ID. dendrextend
assigns the first color to the first cluster ID it finds, regardless of whether that's ID 1. However, I need ID 1 colored in color 1, etc., as I need a legend.
See this example. I want a dendrogram which colors the labels and branches A
-C
in red, D
-F
in blue and G
-I
in green.
suppressPackageStartupMessages(library(dendextend))
library(dplyr)
set.seed(12346)
# Sample data:
# ------------
# l = Leaf labels | g = assigned color of leaf | x = value for clustering
dat <- tibble(l = LETTERS[1:9],
g = factor(rep(letters[1:3], each = 3)),
x = round(runif(9,0,10)))
# color_branches() need integer cluster IDs
dat$gi <- dat$g %>% as.integer()
# Color IDs of each group
dat %>% distinct(g, gi)
## # A tibble: 3 x 2
## g gi
## <fct> <int>
## 1 a 1
## 2 b 2
## 3 c 3
# ID 1 = red, ID 2 = blue, ID 3 = green
clucols <- c("red", "blue", "green")
# Clustering & Dendrogram
# -----------------------
dst <- dist(setNames(dat$x, dat$l))
den <- as.dendrogram(hclust(dst))
o <- order.dendrogram(den)
den <- den %>%
color_branches(col = clucols, clusters = dat$gi[o])
# Transfer branch colors to labels
labels_colors(den) <- get_leaves_branches_col(den)
plot(den)
# Legend
dat %>% distinct(g, gi) %>%
{legend("topright", legend = .$g, col = clucols[.$gi], lty = 1)}
Result:
The leaves are not colored in my wanted order, but by cluster position on the plot from left to right
If you change the set.seed(...)
line to set.seed(12345)
, you see that the coloring seems correct. But this is because the clusters appear in correct order by chance, if seen from left to right:
How do I make color_branches()
assign colors by cluster ID, not by which cluster comes first?
Other SO questions I tried
Dendextend: Regarding how to color a dendrogram’s labels according to defined groups: This question is related, but it only targets coloring labels.
Color dendrogram branches based on external labels uptowards the root until the label matches. An answer proposed
branches_attr_by_cluster
, which I translated into my example like this:den <- den %>% branches_attr_by_clusters( values = clucols[dat$gi[o]], clusters = dat$gi[o], attr = "col")
However, alas the result was the same