4
votes

I am trying to plot the result of agglomerative clustering (UPGMA with Agnes) in the same 'style' as when plotting a tree using the package 'ape'. A simple example I include in the figure below Figure 1. A simple example of the final output required

The key issue is that I want to be able to color the leaves of the dendrogram based on the a pattern in the labels of the leaves. I tried two approaches: either I used hc2Newick or I used the code by Joris Meys as proposed in an answer to Change Dendrogram leaves . Both did not give a satisfactory output. It might be that I do not fully understand the way the dendrograms are constructed either. An ASCII save of the abundance.agnes.ave object (stored from running agnes) can be found on https://www.dropbox.com/s/gke9qnvwptltkky/abundance.agnes.ave .

When I use the first option (with hc2Newick from bioconductor's ctc package) I get the following figure when using this code:

write(hc2Newick(as.hclust(abundance.agnes.ave)),file="all_samples_euclidean.tre")
eucltree<-read.tree(file="all_samples_euclidean.tre")
eucltree.laz<-ladderize(eucltree,FALSE)
tiplabs<-eucltree$tip.label
numbertiplabs<-length(tiplabs)
colourtips<-rep("green",numbertiplabs)
colourtips[grep("II",tiplabs)]<-"red"
plot(eucltree.laz,tip.color=colourtips,adj=1,cex=0.6,use.edge.length=F)
add.scale.bar()

Using plot.phylo

This is obviously not ideal, the 'alignment' of the plot is not as I wanted. I supose this has to do with the branch length calculation however I do not have the foggiest idea how to solve this issue. Certainly when compared to the results of the colLab function, which look more like the dendrogram style I'd like to report. Also, using use.edge.length=T in the code above does give a clustering that is not 'aligned' properly: Plot.phylo with branch length

The second approach using Joris Meys' colLab function with the following code gives the next figure

clusDendro<-as.dendrogram(as.hclust(abundance.agnes.ave))
labelColors<-c("red","green")
clusMember<-rep(1,length(rownames(abundance.x)))
clusMember[grep("II",rownames(abundance.x))]<-2
names(clusMember)<-rownames(abundance.x)

colLab <- function(n)
{
  if(is.leaf(n)) {
    a <- attributes(n)
    # clusMember - a vector designating leaf grouping
    # labelColors - a vector of colors for the above grouping
    labCol <- labelColors[clusMember[which(names(clusMember) == a$label)]]
    attr(n, "nodePar") <- c(a$nodePar, lab.col = labCol)
  }
  n
}

clusDendro<-dendrapply(clusDendro, colLab)
plot(clusDendro,horiz=T,axes=F)

Using colLab This plot is getting closer to what I want, however I do not know why the open circles appear at the leaves and how to remove them.

Any help is much appreciated.

Kind regards,

FM

2

2 Answers

2
votes

This functionality is now available in a new package called "dendextend", built exactly for this sort of thing.

You can see many examples in the presentations and vignettes of the package, in the "usage" section in the following URL: https://github.com/talgalili/dendextend

An almost exact question was just answered in the following SO question:

https://stackoverflow.com/a/18832457/256662

0
votes

I wrote that code quite a while ago, and it appears there's something changed a little in the mechanism.

The plot.dendrogram function I used, has an argument nodePar. The behaviour has changed since the last time I used that function, and although that's used normally for the inner nodes, it apparently has an effect on the outer nodes as well. The default value for pch is 1:2 now, according to the help files.

Hence, you need to specifically specify pch=NA in the attributes you add to the outer nodes in the colLab function. Try adapting it like this:

colLab <- function(n)
{
  if(is.leaf(n)) {
    a <- attributes(n)
    # clusMember - a vector designating leaf grouping
    # labelColors - a vector of colors for the above grouping
    labCol <- labelColors[clusMember[which(names(clusMember) == a$label)]]

    attr(n, "nodePar") <- 
        if(is.list(a$nodePar)) c(a$nodePar, lab.col = labCol,pch=NA) else
                               list(lab.col = labCol,pch=NA)
  }
  n
}

On my machine, that solves the problem.

Alternatively, you could take a look at the argument use.edge.length of the function plot.phylo in the ape package. You set it to FALSE, but from your explanation I believe you want it to be set on the default, TRUE.

EDIT: In order to make the function more generic, it might be a good idea to add labelColors and clusMember as arguments to the function. My quick-n-dirty solution isn't the best example of clean code...

Also forget what I said about using the edge length. the ape package interpretes it as a real dendrogram and putting use.edge.length to TRUE will convert the edge lengths to evolution time. Hence the 'weird' outlining of the dendrogram.

Also note that in case the treeleafs don't have a nodePar attribute, adding extra parameters using the c() function will lead to undesired effects: if you add eg lab.cex=0.6, the c() function will create a vector instead of a list, and convert the value for lab.cex to character whenever there's a character value in the parameter list. In this case that's going to be the name of the color, and that explains the error you talk about in the comment.