Dear stack overflow community,
I would like to ask you for help with my problem. I am using package ggtree to plot phylogenetic trees and I would like to show on these plots more information as it is usual in papers. I am especially interested in having a tree with coloured branches (with blended gradient) showing some variation in a continuous trait and some point at the end of the branches indicating with color (or shape) a discrete trait. While I am able to do both stuff separately, I totally failed trying to plot both such things in one plot. Can you help, please?
Here I am providing you with reproducible example. Lets have this random tree (tree.1) with nine species and some random branch lengths and this randon table of data about these species (data1):
###STACK EXAMPLE
source("https://bioconductor.org/biocLite.R")
biocLite("ggtree")
library(ggtree)
tree.1<-read.tree(text="(spec1:2.2,((spec2:1.8,(spec9:1.4,(spec3:1.3,spec5:1.3):0.1):0.4):0.2,(spec8:1.7,(spec6:1.5,(spec7:1,spec4:1):0.5):0.2):0.3):0.2);")
data1<-data.frame(row.names = c("spec1","spec2","spec3","spec4","spec5","spec6","spec7","spec8","spec9"),
"tip" = c("spec1","spec2","spec3","spec4","spec5","spec6","spec7","spec8","spec9"),
"colour" = c("red", "red", "blue", "red", "red", "blue", "blue", "red", "blue"),
"fylo.signal" = c(0.1, 1.0, 0.3, 0.6, 0.2, 0.8, 0.7, 0.3, 0.6))
If you look at the data there is the colour column which is my discrete variable and the fylo.signal which is a random continuous variable.
To make those graphs I was following two examples (this one for gradient colour of branches and my old question for colour of the points at the end of branches).
I can start with the colour gradient branches. There is a little bit of black box to me before plotting the data but I guess I understood at least a bit what it does. First I extract just the continuous variable (b) and calculate the "nodes" for my tree (a) and then calculate my continuous variable for all the "non tip" nodes in my tree, i.e. not just the end. Then I merge the data together.
b <- as.matrix(data1)[,3]
a <- data.frame(node = nodeid(tree.1, names(b)),
signal = b)
fit2 <- phytools::fastAnc(tree.1,b,vars=TRUE,CI=TRUE)
c <- data.frame(node = names(fit2$ace), signal = fit2$ace)
d.1 <- rbind(a, c)
d.1$node <- as.numeric(d.1$node)
d.1$signal <- as.numeric(d.1$signal)
After that I insert also the discrete variable (and I make the internal nodes have "NA" for this "colour"):
colour.vector <- c(data1$colour, rep(NA, nrow(d.1)-nrow(data1)))
d.2 <- cbind(d.1, colour.vector)
d.2
... and then I insert these data to the phylogenetic tree itself:
tree.2 <- dplyr::full_join(tree.1, d.2, by = 'node')
Now for the plotting. I can make the gradient colour of the branches to represent my continuous variable. Following code produce this plot:
## example1 (SEPARATE TREES)
t1 <- ggtree(tree.2, aes(color=signal), layout = 'circular',
ladderize = FALSE, continuous = TRUE, size=2) +
ggplot2::scale_color_gradientn(colours=c('red', 'orange', 'green', 'cyan', 'blue')) +
geom_tiplab(hjust = -.1, offset=.1) +
theme(legend.position = c(.05, .85))
t1
... and this image when I try plot my discrete variable as differently coloured points at the end of branches (make note that even though the colours are inverted, it actually follow the dataset I was using):
t2 <- ggtree(tree.2, layout = 'circular') + geom_tiplab(hjust = -.1, offset=.1)
t2 <- t2 %<+% data1 + geom_tippoint(pch=16, size=4, aes(col=colour))
t2
But when I try to combine these two, it will produce an error:
## example 1.5 (ERROR)
t3 <- t1 %<+% data1 + geom_tippoint(pch=16, size=4, aes(col=colour))
t3 ## Error: Discrete value supplied to continuous scale
I guess when the function "aes" is used when making the tree, it cannot be override for subparts of the plot? I don't understand that. My best shot is following code:
## example 2 (WRONG ORDER OF COLOURS)
t4 <- ggtree(tree.2, aes(color=signal), layout = 'circular',
ladderize = FALSE, continuous = TRUE, size=2) +
ggplot2::scale_color_gradientn(colours=c('red', 'orange', 'green', 'cyan', 'blue')) +
geom_tiplab(hjust = -.1, offset=.1) +
theme(legend.position = c(.05, .85)) +
geom_tippoint(pch=16, size=4, color=as.factor(colour.vector[1:9]))
t4
... which actually makes this WRONG picture. The points at the end of the branches are coloured but not according to what was in the original dataset. They follow the order in the dataset but are not assigned to correct "species". Species were coloured according to the sequence from dataset from "spec1" counter clockwise. I cannot make the ggtree follow actually the "species", as in my second plot above using the same code.
Anyone can help, please?