2
votes

I'm trying to produce a stacked barplot with an error bar which represents the total variability per bar. I don't want to use a dodged barplot as I have >10 categories per bar.

Below I have some sample data for a reproducible example:

scenario = c('A','A','A','A')
strategy = c('A','A','A','A')
decile = c(0,0,10,10)
asset = c('A','B','A','B')
lower = c(10,20,10, 15)
mean = c(30,50,60, 70)
upper = c(70,90,86,90)
data = data.frame(scenario, strategy, decile, asset, lower, mean, upper)

And once we have the data df we can use ggplot2 to create a stacked bar as so:

ggplot(wide, aes(x=decile, y=mean, fill=asset)) + 
  geom_bar(stat="identity") +
  facet_grid(strategy~scenario) +
  geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.25)

However, the error bars produced are for each individual component of each stacked bar:

enter image description here

I appreciate this results from me providing the lower, mean and upper for each row of the df, but even when I summed these per decile I didn't get my desired errorbars at the top of each bar stack.

What is the correct ggplot2 code, or alternatively, what is the correct data structure to enable this?

2
Maybe are you looking for this: ggplot(data, aes(x=factor(decile), y=mean, fill=asset,group=scenario)) + geom_bar(stat="identity") + facet_grid(strategy~scenario) + geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.25)Duck

2 Answers

2
votes

I think you're correct in realising you need to manipulate your data rather than your plot. You can't really have position_stack on an errorbar, so you'll need to recalculate the mean, upper and lower values for the errorbars. Essentially this means getting the cumulative sum of the mean values, and shifting the upper and lower ranges accordingly. You can do this inside a dplyr pipe.

Note I think you will also need to have a position_dodge on the error bars, since their range overlaps even when shifted appropriately, which will make them harder to interpret visually:

library(ggplot2)
library(dplyr)

data %>% 
  mutate(lower = lower - mean, upper = upper - mean) %>%
  group_by(decile) %>% 
  arrange(rev(asset), by.group = TRUE) %>%
  mutate(mean2 = cumsum(mean), lower = lower + mean2, upper = upper + mean2) %>%
  ggplot(aes(x = decile, y = mean, fill = asset)) + 
  geom_bar(stat = "identity") +
  facet_grid(strategy ~ scenario) +
  geom_errorbar(aes(y = mean2, ymin = lower, ymax = upper), width = 2,
                position = position_dodge(width = 2)) +
  geom_point(aes(y = mean2), position = position_dodge(width = 2))

enter image description here

1
votes

If you want only one error bar per decile, you should aggregate the values so that there is not difference between assest like this:

library(ggplot2)
library(dplyr)
#Code
data %>% group_by(scenario,decile) %>% 
  mutate(nlower=mean(lower),nupper=mean(upper)) %>%
  ggplot(aes(x=factor(decile), y=mean, fill=asset,group=scenario)) + 
  geom_bar(stat="identity") +
  facet_grid(strategy~scenario) +
  geom_errorbar(aes(ymin = nlower, ymax = nupper), width = 0.25)

Output:

enter image description here

It is other thing using asset as it will consider each class as you have different values for each of them:

#Code 2
data %>%
  ggplot(aes(x=factor(decile), y=mean, fill=asset,group=scenario)) + 
  geom_bar(stat="identity") +
  facet_grid(strategy~scenario) +
  geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.25)

Output:

enter image description here

In last version, each asset has its own error bar, but if you want to see erros globally, you should use an approach aggregating the limits as that was done with mean values or other measure you wish.