3
votes

I have the following data.frame:

df = data.frame(ymin = c(0.35,0.4,0.25,0.3,0.55,0.6), lower = c(0.45,0.5,0.35,0.4,0.65,0.7), middle = c(0.5,0.55,0.4,0.45,0.7,0.75), upper = c(0.55,0.6,0.45,0.5,0.75,0.8), ymax = c(0.65,0.7,0.55,0.6,0.85,0.9), factor = c("parental","parental","cross","cross","sex","sex"), factor.label = c("paternal","maternal","F1i","F1R","M","F"), posterior.probability = c(0.92,0.92,0.97,0.97,0.99,0.99), x = c(1,1,2,2,3,3), colors = c("blue","red","gray30","gray70","lightskyblue","pink"))

I want to produce a ggplot2 box plot where df$x defines the x axis locations of the boxes and df$lower, df$upper, df$middle, df$ymin, df$ymax define the boxes, and df$colors define the colors of the boxes. As you can see each pair of boxes should be put on the same x axis location. With the alpha parameter I'll make the boxes transparent so overlaps will be visible.

What I have so far is this code:

p = ggplot(beta.df, aes(x = x, color = colors))
p = p + geom_boxplot(aes(lower = lower,upper = upper, middle = middle, ymin = ymin, ymax = ymax, fill = colors), position = position_dodge(width = 0), width = 0.5, alpha = 0.5, stat = "identity")

enter image description here Obviously the colors of the boxes are messed up - the "blue" and "red" boxes should be at df$x = 1, the "gray30" and "gray70" boxes should be at df$x = 2, and , the "lightskyblue" and "pink" boxes should be at df$x = 3. So I'm looking to fix that.

I additionally want to have the legend title and labels to be specified rather then generated by defaults. In addition, the df$posterior.probability values are identical for each pair of boxes with the same x axis location, and what I would like is to draw these values (one for each pair) either at the top of the plot, say at the maximum y axis value, or on top the max(ymax) of each pair of boxes and at the corresponding x axis locations. In other words, 0.92, 0.97, and 0.99 will be drawn at x locations: 1, 2, and 3, respectively, and at either the maximum y location of the plot or at these locations y locations: 0.75, 0.65, and 0.95, respectively.

2

2 Answers

2
votes

Your boxplots are overlapping and as I understand from your code, you don't want that. You don't need the position element in your code and you have tot treat you x as a factor in order to get them plotted next to each other.

With this code:

ggplot(df, aes(x = as.factor(x), color = colors)) + 
  geom_boxplot(aes(lower = lower,upper = upper, middle = middle, ymin = ymin, ymax = ymax, fill = colors), width = 0.5, alpha = 0.5, stat = "identity") +
  geom_text(data = df, aes(x = as.factor(x), y = ymax, label = posterior.probability), size = 4, vjust = -0.5) +
  scale_fill_identity("Color legend", guide = "legend", labels = c("paternal","F1i","F1R","M","F","maternal")) +
  scale_color_identity("Color legend", guide = "legend", labels = c("paternal","F1i","F1R","M","F","maternal")) +
  labs(title = "Plot title", x = "X-lab label", y = "Y-lab label")

You get this result:

enter image description here

When you don't want the text-labels in the same color, add color = "black" to the geom_text part.

2
votes

For labels made new data frame that contains only one label for each x position and y value is calculated as maximal ymax value.

library(plyr)
df.text<-ddply(df,.(x),summarise,y=max(ymax),label=max(posterior.probability))

As you supply color names then you should use scale_fill_identity() and scale_color_identity() to tell ggplot to interpret those as actual color. If you need to show legend, then add argument guide="legend" to scale_fill_identity() and scale_color_identity() and then provide labels= you want to show in legend. Then use geom_text() and new data frame to add labels above boxplots.

ggplot(df, aes(x = x)) + 
  geom_boxplot(aes(lower = lower,upper = upper, middle = middle, ymin = ymin, 
                    ymax = ymax,color = colors, fill = colors), 
                    position = position_dodge(width = 0), width = 0.5, 
                     alpha = 0.5, stat = "identity")+
  scale_fill_identity("Legend name",guide="legend",
                      labels=c("paternal","F1i","F1R","M","F","maternal"))+
    scale_color_identity("Legend name",guide="legend",
                      labels=c("paternal","F1i","F1R","M","F","maternal"))+
  geom_text(data=df.text,aes(x=x,y=y,label=label))

enter image description here