2
votes

I'm trying to create a stacked bar plot from raw data, where each set of factor variables potentially has multiple entries and the y-values should be the sum of all such entries. Doing a normal geom_bar at first looks fine, but it seems to plot each original entry as a separate rectangle stacked on each other. This looks okay, until you want to draw a frame around each part of the bar chart:

library(tidyverse)
data = tibble(
    age = factor(c(2, 3, 3, 3, 2, 2)),
    value = c(30, 5, 15, 14, 29, 9)
)
ggplot(data, aes(x = "Observation", y = value, fill = age)) +
    geom_bar(stat = "identity", colour = "black")

enter image description here

What I actually want is one frame around the turquoise and one rectangle around the red parts. How can I do this with ggplot directly?

Of course, one can manually call summarize:

ggplot(data %>% group_by(age) %>% summarize(value = sum(value)), 
  aes(x = "Observation", y = value, fill = age)) +
  geom_bar(stat = "identity", colour = "black")

enter image description here But that needs to be adjusted for each different selection of axes variables, which will be a pain, as I'm working with ~15 factor dimensions and have to create dozens of charts, with different factor variables for each of the axes (including facet_grid).

So ideally, ggplot / geom_bar would automatically do the aggregation and then draw the aggregated value rather than each individual entry separately. Is this possible?

1

1 Answers

3
votes

stat_summary() can handle these types of summaries on the fly.

You just specify the geometry and the function to summarize with. Here we also need to explicitly say to stack the bars to prevent overlap.

ggplot(data, aes(x = "Observation", y = value, fill = age,)) +
  stat_summary(geom = "bar", fun.y = "sum", position = "stack")

enter image description here