I am making boxplots with ggplot with data that is classified by 2 factor variables. I'd like to have the box sizes reflect sample size via varwidth = TRUE
but when I do this the boxes overlap.
1) Some sample data with a 3 x 2 structure
data <- data.frame(group1= sample(c("A","B","C"),100, replace = TRUE),group2= sample(c("D","E"),100, replace = TRUE) ,response = rnorm(100, mean = 0, sd = 1))
2) Default boxplots: ggplot without variable width
ggplot(data = data, aes(y = response, x = group1, color = group2)) + geom_boxplot()
I like how the first level of grouping is shown.
Now I try to add variable widths...
3) ...and What I get when varwidth = TRUE
ggplot(data = data, aes(y = response, x = group1, color = group2)) + geom_boxplot(varwidth = T)
This overlap seems to occur whether I use color = group2
or group = group2
in both the main call to ggplot
and in the geom_boxplot
statement. Fussing with position_dodge
doesn't seem to help either.
4) A solution I don't like visually is to make unique factors by combining my group1 and group2
data$grp.comb <- paste(data$group1, data$group2)
ggplot(data = data, aes(y = response, x = grp.comb, color = group2)) + geom_boxplot()
I prefer having things grouped to reflect the cross classification
5) The way forward:
I'd like to either a)figure out how to either make varwidth = TRUE
not cause the boxes to overlap or b)manually adjusted the space between the combined groups so that boxes within the 1st level of grouping are closer together.
ggplot(data = data, aes(y = response, x = group1, color = group2)) + geom_boxplot(varwidth = T, alpha = 0.5)
– Chris