7
votes

I have a grouped boxplot using data with 3 categories. One category is set as the x-axis of the boxplots, the other is set as the fill, and the last one, as a faceting category. I want to display the means for each fill group, but using stat_summary only gives me the mean for the x-axis category, without separating the means for the fill:

facetted boxplots

Here is the current code:

demoplot<-ggplot(demo,aes(x=variable,y=value))
demoplot+geom_boxplot(aes(fill=category2),position=position_dodge(.9))+
stat_summary(fun.y=mean, colour="black", geom="point", shape=18, size=4,) +
facet_wrap(~category1)

Is there any way to display the mean for each category2 without having to manually compute and plot the points? Adjusting the position dodge doesn't really help, as it's just one computed mean. Would creating conditions within the mean() function be advisable?

For anyone interested, here's the data:

Advanced thanks for any enlightenment on this.

1
your link does not work. geom_box() allows you to compute your own stats (docs.ggplot2.org/dev/geom_boxplot.html)MLavoie
@MLavoie is the link dead? Not sure why, I used a bit.ly to dropbox.com/s/mlvx0hu3rwuxtgj/demo.csv?dl=0 I see, are you suggesting I use the stat within the geom_boxplot()?dizzygirl
if you scroll down you will see that example (just adapt for your example): y <- rnorm(100) df <- data.frame( x = 1, y0 = min(y), y25 = quantile(y, 0.25), y50 = median(y), y75 = quantile(y, 0.75), y100 = max(y) ) ggplot(df, aes(x)) + geom_boxplot( aes(ymin = y0, lower = y25, middle = y50, upper = y75, ymax = y100), stat = "identity" )MLavoie
I am not sure about this one; are you recommending that it would be easier to manually create the box plots one by one, by inputting the quartiles in the code? @MLavoiedizzygirl
here is an example of what I mean stackoverflow.com/questions/34081405/…MLavoie

1 Answers

17
votes

Ggplot needs to have explicit information on grouping here. You can do that either by using a aes(group=....) in the desired layer, or moving the fill=... to the main call to ggplot. Without explicit grouping for a layer, ggplot will group by the factor on the x-axis. Here's some sample code with fake data:

library(ggplot2)
set.seed(123)

nobs <- 1000
dat <- data.frame(var1=sample(LETTERS[1:3],nobs, T),
                  var2=sample(LETTERS[1:2],nobs,T),
                  var3=sample(LETTERS[1:3],nobs,T),
                  y=rnorm(nobs))

p1 <- ggplot(dat, aes(x=var1, y=y)) +
  geom_boxplot(aes(fill=var2), position=position_dodge(.9)) +
  facet_wrap(~var3) +
  stat_summary(fun.y=mean, geom="point", aes(group=var2), position=position_dodge(.9), 
               color="black", size=4)

enter image description here