10
votes

I have been having issues with what seems to be a simple thing to do: grouped boxplots with a continuous x axis.

Here is come minimal data data:

df <- cbind(expand.grid(x=1:10, rep=1:20, fill=c("A", "B")), y=runif(400))

And here is what I want; you will see I have forced the x axis to be discrete:

ggplot(df, aes(x=as.factor(x), y=y,  fill=fill)) + geom_boxplot()

enter image description here

This is what I get when I leave x as continuous, without a grouping:

ggplot(df, aes(x=x, y=y,  fill=fill)) + geom_boxplot()

enter image description here

When I add a grouping, the color disappears:

 ggplot(df, aes(x=x, y=y, group=x, fill=fill)) + geom_boxplot()

enter image description here

To be clear, what I would want in geom_point would be:

ggplot(df, aes(x=x, y=y, group=x, color=fill)) + geom_point(position=position_dodge(width=.7))

enter image description here

...but if I try to set a dodge in boxplot:

ggplot(df, aes(x=x, y=y, color=fill)) + geom_boxplot(position=position_dodge(width=.7))

enter image description here

Any tips? I have tried searching around: this question addressed continuous boxplots, but without the coloring issue; this question makes me wonder if i need to set an interaction, but doesn't seem to get the desired results. Any help would be hugely appreciated!

2
I know this is not what you asked, but you can achieve something similar (depending of the data will work) making groups by intervals on x. ggplot(df, aes(x=cut_interval(x=x, length=1), y=y, fill=fill)) + geom_boxplot()S Rivero

2 Answers

15
votes

From ?aes_group_order:

By default, the group is set to the interaction of all discrete variables in the plot.

In your data, you only have one discrete variable, "fill". However, we wish the data to be grouped by both "fill" and "x". Thus, we need to specify the desired grouping using the group argument. And yes, you were correct, interaction is the way to go.

First, a slightly smaller data set (easier to link data to output):

d <- data.frame(x = rep(c(1, 2, 4), each = 8),
                grp = rep(c("a", "b"), each = 4),
                y = sample(24))

Then the plot, where we group data by the different combinations of "x" and "grp" (interaction(x, grp)), and fill the boxes by "grp":

ggplot(d, aes(x = x, y = y, group = interaction(x, grp), fill = grp)) +
  geom_boxplot()

enter image description here

1
votes

Here is a version that works, custom built for your own cut sizes:

Take the original df:

{df <- cbind(expand.grid(x=1:10, rep=1:20, fill=c("A", "B")), y=runif(400))}

Use cut() to define where you want your x groups and use "dodge2" to position your graphs:

{ggplot(df, aes(x = cut(x, 5), y = y, fill = fill)) +
        geom_boxplot(position = "dodge2", outlier.alpha = 0.1)}

Boxplot with 5 custom groups with equal cuts between 1:10 Boxplot with 5 custom groups with equal cuts between 1:10