1
votes

I have the following data frame

group1 = c('a', 'b')
group2 = c('1', '1', '2', '2')
mean = 1:4
sd = c(0.2, 0.3, 0.5, 0.8)
df = data.frame(group1, group2, mean, sd)

I want to plot the sd on the graph, either with geom_errorbar(). This works perfectly:

ggplot(data = df, aes(x=group1, y = mean))+
  geom_col(position = 'dodge') + 
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd),
                position = 'dodge')

As I wanted to reduce the width of the error bars I run:

ggplot(data = df, aes(x=group1, y = mean))+
  geom_col(position = 'dodge') + 
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), width = 0.2,
                position = 'dodge')

So far so good. But then I wanted to fill by group2.

ggplot(data = df, aes(x=group1, y = mean, fill = group2))+
  geom_col(position = 'dodge') + 
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), width = 0.2,
                position = 'dodge')

The problem is that the error bars are no longer in the middle of bars. I have no idea why. I looked at the documentation but did not find anything on this issue. I looked at this question Force error bars to be in the middle of bar and this one Aesthetics issue using position dodge in ggplots with geom_errorbar but no one explained why this occurs. One proposed solution is to add position_dodge(0.9).

ggplot(data = df, aes(x=group1, y = mean, fill = group2))+
  geom_col(position = 'dodge') + 
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), width = 0.2,
                position = position_dodge(0.9))

It worked but I have no idea why and what it did. Can someone explain what is going on? Why can't I just add width = 0.2 to reduce the width of the error bar? And what does position_dodge(0.9) do? Why do I need it? Why the problem occurs only when I added fill = group2?

1

1 Answers

2
votes

TL;DR: From the start, position = "dodge" (or position = position_dodge(<some width value>)) wasn't doing what you thought it was doing.

Underlying intuition

position_dodge is one of the position-adjusting functions available in the ggplot2 package. If there are multiple elements belonging to different groups occupying the same location, position_identity would do nothing at all, position_dodge would place the elements side by side horizontally, position_stack would place them on top of one another vertically, position_fill would place them on top of one another vertically & stretch proportionally to fit the whole plot area, etc.

Here's a summary of different position-adjusting functions' behaviours, from RStudio's ggplot2 cheat sheet:

cheatsheet

Note that the elements to be dodged / etc. must belong to different groups. If group = <some variable> is specified explicitly in a plot, that would be used as the grouping variable for determining which elements should be dodged / etc. from one another. If there's no explicit group mapping in aes(), but there's one or more of color = <some variable> / fill = <some variable> / linetype = <some variable> / and so on, the interaction of all discrete variables would be used. From ?aes_group_order:

By default, the group is set to the interaction of all discrete variables in the plot. This often partitions the data correctly, but when it does not, or when no discrete variable is used in the plot, you will need to explicitly define the grouping structure, by mapping group to a variable that has a different value for each group.

Plot by plot breakdown

Let's start with your original plot. As there was no grouping variable of any kind in the plot's aesthetic mappings, position = "dodge" did absolutely nothing.

We can replace that with position = "identity" for both geom layers (in fact, position = "identity" is the default position for geom_errorbar, so there's no need to spell it out), and the resulting plot would be the same.

Increasing the transparency makes it obvious that the two bars are occupying the same spot, one "behind" another.

I guess this original plot isn't what you actually intended? There are really very few scenarios where it would make sense for one bar to be behind another like this...

ggplot(data = df, aes(x=group1, y = mean))+
  geom_col(position = 'dodge') + 
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd),
                position = 'dodge') +
  ggtitle("original plot")

ggplot(data = df, aes(x=group1, y = mean))+
  geom_col(position = "identity") + 
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd)) +
  ggtitle("remove position dodge")

ggplot(data = df, aes(x=group1, y = mean))+
  geom_col(position = "identity", alpha = 0.5) + 
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd)) +
  ggtitle("increase transparency")

illustration 1

I'll skip over the second plot, since adding width = 0.2 didn't change anything fundamental.

In the third plot, we finally put position = "dodge" to use, because there's a group variable now. The bars & errorbars move accordingly, based on their respective widths. This is the expected behaviour if position = "dodge" is used instead of position = position_dodge(width = <some value>, ...), where the distance dodged follows the geom layer's width by default, unless it's overridden by a specific value in position_dodge(width = ...).

If the geom_errorbar layer kept to its default width (which is the same as the default width for geom_col), both layers' elements would have been dodged by the same amount.

ggplot(data = df, aes(x=group1, y = mean, fill = group2))+
  geom_col(position = 'dodge') + 
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), width = 0.2,
                position = 'dodge') +
  ggtitle("third plot")

ggplot(data = df, aes(x=group1, y = mean, fill = group2))+
  geom_col(position = 'dodge') + 
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), 
                position = 'dodge') +
  ggtitle("with default width")

illustration 2

Side note: We know both geom_errorbar & geom_col have the same default width, because they set up their data in the same way. The following line of code can be found in both GeomErrorbar$setup_data / GeomCol$setup_data:

data$width <- data$width %||% params$width %||% (resolution(data$x, FALSE) * 0.9)
# i.e. if width is specified as one of the aesthetic mappings, use that;
#      else if width is specified in the geom layer's parameters, use that;
#      else, use 90% of the dataset's x-axis variable's resolution.        <- default value of 0.9

In conclusion, when you have different aesthetic groups, specifying the width in position_dodge determines the distance moved by each element, while specifying the width in each geom layer's determines each element's... well, width. As long as different geom layers dodge by the same amount, they will be in alignment with one another.

Below is a random example for illustration, which uses different width values for each layer (0.5 for geom_col, 0.9 for geom_errorbar), but the same dodge width (0.6):

ggplot(data = df, aes(x=group1, y = mean, fill = group2))+
  geom_col(position = position_dodge(0.6), width = 0.5) + 
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), width = 0.9,
                position = position_dodge(0.6)) +
  ggtitle("another example")

example