Box plots can be handy to summarize continuous data, however, boxplots for rare subgroups (n<10) are not always helpful. I was wondering if it would be possible to replace a boxplot with the raw data points in a grouped boxplot for those groups that are rare?
Example:
library(ggplot2)
p <- ggplot(mpg, aes(class, hwy))
p + geom_boxplot()
Produces a box plot of hwy (continuous) by each class (car type). However, looking at the frequencies for each class, we see that there are only 5 2seaters and 11 minivans. Instead of the box plot for 2seaters and minivans I'd like to see the raw data (points, potentially jittered), but keep the box plot for the other groups that meet the artificially set minimum sample size (eg n=20).
table(mpg$class)
2seater compact midsize minivan pickup subcompact suv
5 47 41 11 33 35 62
Is that even possible?
Cheers, Luc