13
votes

I am making boxplots with ggplot with data that is classified by 2 factor variables. I'd like to have the box sizes reflect sample size via varwidth = TRUE but when I do this the boxes overlap.

1) Some sample data with a 3 x 2 structure

data <- data.frame(group1= sample(c("A","B","C"),100, replace = TRUE),group2= sample(c("D","E"),100, replace = TRUE) ,response = rnorm(100, mean = 0, sd = 1))

2) Default boxplots: ggplot without variable width

ggplot(data = data, aes(y = response, x = group1, color = group2)) + geom_boxplot()

enter image description here

I like how the first level of grouping is shown.
Now I try to add variable widths...

3) ...and What I get when varwidth = TRUE

ggplot(data = data, aes(y = response, x = group1, color = group2)) + geom_boxplot(varwidth = T)

enter image description here

This overlap seems to occur whether I use color = group2 or group = group2 in both the main call to ggplot and in the geom_boxplot statement. Fussing with position_dodge doesn't seem to help either.

4) A solution I don't like visually is to make unique factors by combining my group1 and group2

data$grp.comb <- paste(data$group1, data$group2)

ggplot(data = data, aes(y = response, x = grp.comb, color = group2)) + geom_boxplot()

enter image description here

I prefer having things grouped to reflect the cross classification

5) The way forward: I'd like to either a)figure out how to either make varwidth = TRUE not cause the boxes to overlap or b)manually adjusted the space between the combined groups so that boxes within the 1st level of grouping are closer together.

3
This is not a solution to the stated problem, but I would add that having the overlapping boxes is probably better from a interpretation perspective as you can accurately compare widths, vs having to measure it out in your head. To see both plots, I would use the alpha argument here instead ggplot(data = data, aes(y = response, x = group1, color = group2)) + geom_boxplot(varwidth = T, alpha = 0.5)Chris
See my solution belowTClavelle

3 Answers

2
votes

I think your problem can be solved best by using facet_wrap.

    library(ggplot2)
    data <- data.frame(group1= sample(c("A","B","C"),100, replace = TRUE), group2= 
    sample(c("D","E"),100, replace = TRUE) ,response = rnorm(100, mean = 0, sd = 1))

    ggplot(data = data, aes(y = response, x = group2, color = group2)) + 
      geom_boxplot(varwidth = TRUE) +
      facet_wrap(~group1)

Which gives: enter image description here

1
votes

A recent update to ggplot2 makes it so that the code provided by @N Brouwer in (3) works as expected:

# library(devtools)
# install_github("tidyverse/ggplot2")

packageVersion("ggplot2") # works with v2.2.1.9000
library(ggplot2)
set.seed(1234)
data <- data.frame(group1= sample(c("A","B","C"), 100, replace = TRUE),
                   group2= sample(c("D","E"), 100, replace = TRUE),
                   response = rnorm(100, mean = 0, sd = 1))

ggplot(data = data, aes(y = response, x = group1, color = group2)) + 
  geom_boxplot(varwidth = T)

(I'm a new user and can't post images inline) fig 1

0
votes

This question has been answered here ggplot increase distance between boxplots

The answer involves using the position = position_dodge() argument of geom_boxplot().

For your example:

data <- data.frame(group1= sample(c("A","B","C"),100, replace = TRUE),  group2= 
                 sample(c("D","E"),100, replace = TRUE) ,response = rnorm(100, mean = 0, sd = 1))

ggplot(data = data, aes(y = response, x = group1, color = group2)) + 
 geom_boxplot(position = position_dodge(1))

enter image description here