0
votes

I've got data with two categorical variables. I can boxplot these but I can't get the means to display in the correct position. I've created the effect in the iris dataset (the red rectangles are added by hand, not in ggplot).

means are not plotted with relevent boxplot

Iris <- iris %>%
        mutate(SepalLengthType = ifelse(Sepal.Length > 5.8, "high", "low"))

means <- Iris %>% 
        group_by(Species, SepalLengthType) %>% 
        summarise(Sepal.Width = mean(Sepal.Width), .groups = "keep")
plot <- ggplot(data = Iris, aes(y=Sepal.Width, x = SepalLengthType, colour = Species))+
        geom_boxplot()

Now I want to add the means to each box plot These lines below all work, but the mean is not centred on the box plot but on the SepelLengthType category.

plot + stat_summary(fun = "mean" , aes(color = Species), shape = 15)
plot + stat_summary(fun = "mean" , aes(group = Species), shape = 15)
plot + stat_summary(fun.y = "mean", shape = 15) # this works, but is deprecated
plot + geom_point(data = means, aes(color = Species), shape = 15)

How can the means be displayed in the middle of each box plot? I appreciate I could re-arrange the data so each set of data points is in it's own column, but as they are not all the same length, this needs it's own work-arounds.

When I use fun = "mean" I get a warning message "Removed 5 rows containing missing values (geom_segment)." Why is that? The 'means' line does not have this problem but I'd rather not have to calculate the means myself.

1

1 Answers

2
votes

You can use position=position_dodge(0.9) like the following code

library(tidyverse)

Iris <- iris %>%
  mutate(SepalLengthType = ifelse(Sepal.Length > 5.8, "high", "low"))

means <- Iris %>% 
  group_by(Species, SepalLengthType) %>% 
  summarise(Sepal.Width = mean(Sepal.Width), .groups = "keep")

plot <- ggplot(data = Iris, aes(y=Sepal.Width, x = SepalLengthType, colour = Species))+
  geom_boxplot(position=position_dodge(0.9))

plot + geom_point(data = means, aes(color = Species), shape = 15, 
                  position = position_dodge2(width = 0.9))

enter image description here

or using stat_summary as

plot + stat_summary(fun = "mean", aes(group = Species), shape = 15, 
                  position = position_dodge2(width = 0.9))

enter image description here