0
votes

I wish to add the number of observations to this boxplot, not by group but separated by factor. Also, I wish to display the number of observations in addition to the x-axis label that it looks something like this: ("PF (N=12)"). Furthermore, I would like to display the mean value of each box inside of the box, displayed in millions in order not to have a giant number for each box.

Here is what I have got:

    give.n <- function(x){
    return(c(y = median(x)*1.05, label = length(x)))
    }

    mean.n <- function(x){x <- x/1000000
    return(c(y = median(x)*0.97, label = round(mean(x),2)))
    }


    ggplot(Soils_noctrl) +  
    geom_boxplot(aes(x=Slope,y=Events.g_Bacteria, fill = Detergent), 
               varwidth = TRUE) +
    stat_summary(aes(x = Slope, y = Events.g_Bacteria), fun.data = give.n, geom = "text", 
               fun = median,
               position = position_dodge(width = 0.75))+
    ggtitle("Cell Abundance")+
    stat_summary(aes(x = Slope, y = Events.g_Bacteria), 
               fun.data = mean.n, geom = "text", fun = mean, colour = "red")+
    facet_wrap(~ Location, scale = "free_x")+
    scale_y_continuous(name = "Cell Counts per Gram (Millions)", 
                     breaks = round (seq(min(0), 
                                         max(100000000), by = 5000000),1),
                     labels = function(y) y / 1000000)+
    xlab("Sample")

And so far it looks like this: As you can see, the mean value is at the bottom of the plot and the number of observations are in the boxes but not separated

Thank you for your help! Cheers

1
It's really difficult to make a good recommendation without any sample data. See stackoverflow.com/a/5965451/4114240. My best guess is that your problem is stat_summary is not inheriting the aes, but defining a new one and it does not include Detergent. So, the code is putting the text where the boxplots would be if they weren't separated based on the Detergent factor. Only my best guess. HTHmarkhogue
It might be easier to use geom_text for the sample size and mean - you can set the x and y coords e.g. geom_text(aes(x = Slope, y = min(Events.g.bacteria), label = give.n)) + geom_text(aes(x = Slope, y = 1.1 * min(Events.g.bacteria), label = mean.n)) should put the sample number at the bottom and the mean just above that. You might need to play with the proportions a bit (e..g 0.9*min(...), etc)mlcyo
Another possible idea is that the fill parameter splits the data between the combinations of the facet and x variables. But the median and mean functions are using all of the values within the given combinations. In particular, how many rows of your data fit the AL_S and the Buot facet? Are there 9 of them?statstew

1 Answers

1
votes

TL;DR - you need to supply a group= aesthetic, since ggplot2 does not know on which column data it is supposed to dodge the text geom.

Unfortunately, we don't have your data, but here's an example set that can showcase the rationale here and the function/need for group=.

set.seed(1234)
df1 <- data.frame(detergent=c(rep('EDTA',15),rep('Tween',15)), cells=c(rnorm(15,10,1),rnorm(15,10,3)))
df2 <- data.frame(detergent=c(rep('EDTA',20),rep('Tween',20)), cells=c(rnorm(20,1.3,1),rnorm(20,4,2)))
df3 <- data.frame(detergent=c(rep('EDTA',30),rep('Tween',30)), cells=c(rnorm(30,5,0.8),rnorm(30,3.3,1)))

df1$smp='Sample1'
df2$smp='Sample2'
df3$smp='Sample3'

df <- rbind(df1,df2,df3)

Instead of using stat_summary(), I'm just going to create a separate data frame to hold the mean values I want to include as text on my plot:

summary_df <- df %>% group_by(smp, detergent) %>% summarize(m=mean(cells))

Now, here's the plot and use of geom_text() with dodging:

p <- ggplot(df, aes(x=smp, y=cells)) +
  geom_boxplot(aes(fill=detergent))

p + geom_text(data=summary_df,
    aes(y=m, label=round(m,2)),
    color='blue', position=position_dodge(0.8)
  )

enter image description here

You'll notice the numbers are all separated along y= just fine, but the "dodging" is not working. This is because we have not supplied any information on how to do the dodging. In this case, the group= aesthetic can be supplied to let ggplot2 know that this is the column by which to use for the dodging:

p + geom_text(data=summary_df,
    aes(y=m, label=round(m,2), group=detergent),
    color='blue', position=position_dodge(0.8)
  )

enter image description here

You don't have to supply the group= aesthetic if you supply another aesthetic such as color= or fill=. In cases where you give both a color= and group= aesthetic, the group= aesthetic will override any of the others for dodging purposes. Here's an example of the same, but where you don't need a group= aesthetic because I've moved color= up into the aes() (changing fill to greyscale so that you can see the text):

p + geom_text(data=summary_df,
      aes(y=m, label=round(m,2), color=detergent),
      position=position_dodge(0.8)
  ) + scale_fill_grey()

enter image description here

FUN FACT: Dodging still works even if you supply geom_text() with a nonsensical aesthetic that would normally work for dodging, such as fill=. You get a warning message Ignoring unknown aesthetics: fill, but the dodging still works:

p + geom_text(data=summary_df,
      aes(y=m, label=round(m,2), fill=detergent),
      position=position_dodge(0.8)
  )
# gives you the same plot as if you just supplied group=detergent, but with black text

In your case, changing your stat_summary() line to this should work:

stat_summary(aes(x = Slope, y = Events.g_Bacteria, group = Detergent),...