0
votes

Using ggplot2, I want to fill the bars of a barplot that shows the relative frequencies of one categorial variable (i) in two differently sized groups (g = "A", "B") with a third categorial variable (f). The bars within each group should sum up to 100%.

Here's a reproducible example and what I've tried so far:

set.seed(7)
g <- sample(c("A", "B"), 100, replace=TRUE, prob=c(0.7, 0.3)) 
i <- sample(c("C1", "C2"), 100, replace=TRUE)
f <- sample(c("X", "Y", "Z"), 100, replace=TRUE, prob=c(0.2, 0.3, 0.5))
df <- data.frame(g, i, f)


p1 <- ggplot(df, aes(x=i, y=stat(prop)))+
  geom_bar(aes(group = g, fill = f))+
  facet_grid(~g)
p1

However, the "fill" command has no effect on this plot (all grey bars).

Hence I tried some code found here, that creates groups using 2 variables. The resulting barplot comes close to what I want, is filled by the third variable, but now the percentages do not add up to 100%, resp. 1:

p2 <- ggplot(example_df, aes(x=i, y=stat(prop)))+
  geom_bar(aes(group = interaction(g, f), fill = f))+
  facet_grid(~g)
p2

Altough this problem sounds very similar, applying the code to a stacked and grouped barplot only reproduces my problems stated above.

Any help appreciated - a pure ggplot2 solution would be awesome, though.

2

2 Answers

2
votes

Maybe computing the proportion in a dplyr pipeline can be useful:

set.seed(7)
library(ggplot2)
library(dplyr)
#Data
g <- sample(c("A", "B"), 100, replace=TRUE, prob=c(0.7, 0.3)) 
i <- sample(c("C1", "C2"), 100, replace=TRUE)
f <- sample(c("X", "Y", "Z"), 100, replace=TRUE, prob=c(0.2, 0.3, 0.5))
df <- data.frame(g, i, f)
#Data
df %>% group_by(i,g,f) %>%
  summarise(N=n()) %>%
  group_by(i,g,.drop=T) %>%
  mutate(Prop=N/sum(N)) %>%
  ggplot(aes(x=i))+
  geom_bar(stat='identity',aes(y=Prop, fill = f))+
  scale_y_continuous(labels = scales::percent)+
  facet_grid(~g)

Output:

enter image description here

0
votes

A shorter alternative is to use count and position_fill:

library(dplyr)

df %>% 
  count(g, i, f) %>%
  ggplot(aes(i, n, fill = f)) +
  geom_col(position = position_fill()) +
  scale_y_continuous(labels = scales::percent) +
  facet_grid(~g)

enter image description here