I've got data with two categorical variables. I can boxplot these but I can't get the means to display in the correct position. I've created the effect in the iris dataset (the red rectangles are added by hand, not in ggplot).
Iris <- iris %>%
mutate(SepalLengthType = ifelse(Sepal.Length > 5.8, "high", "low"))
means <- Iris %>%
group_by(Species, SepalLengthType) %>%
summarise(Sepal.Width = mean(Sepal.Width), .groups = "keep")
plot <- ggplot(data = Iris, aes(y=Sepal.Width, x = SepalLengthType, colour = Species))+
geom_boxplot()
Now I want to add the means to each box plot These lines below all work, but the mean is not centred on the box plot but on the SepelLengthType category.
plot + stat_summary(fun = "mean" , aes(color = Species), shape = 15)
plot + stat_summary(fun = "mean" , aes(group = Species), shape = 15)
plot + stat_summary(fun.y = "mean", shape = 15) # this works, but is deprecated
plot + geom_point(data = means, aes(color = Species), shape = 15)
How can the means be displayed in the middle of each box plot? I appreciate I could re-arrange the data so each set of data points is in it's own column, but as they are not all the same length, this needs it's own work-arounds.
When I use fun = "mean" I get a warning message "Removed 5 rows containing missing values (geom_segment)." Why is that? The 'means' line does not have this problem but I'd rather not have to calculate the means myself.