19
votes

I have a dataframe d:

> head(d,20)
   groupchange Symscore3
1            4         1
2            4         2
3            4         1
4            4         2
5            5         0
6            5         0
7            5         0
8            4         0
9            2         2
10           5         0
11           5         0
12           5         1
13           5         0
14           4         1
15           5         1
16           1         0
17           4         0
18           1         1
19           5         0
20           4         0

That I am plotting with:

ggplot(d, aes(groupchange, y=..count../sum(..count..),  fill=Symscore3)) +
  geom_bar(position = "dodge") 

In this way each bar represents its percentage on the whole data.

Instead I would like that each bar represents a relative percentage; i.e. the sum of the bar in obtained with groupchange = k should be 1.

2
Please consider updating the answer to reflect the more accurate and succinct answer below, using position = "fill" especially for a question asking specifically about the ggplot package Otherwise, people are relying upon manually summarizing when the proportion is computed by the geom_bar function itself when using position = "fill" Please consider updating the selected answer so that there is not a persistence of inefficient approaches across the community. I wanted to bring this to your and the community's attention.HoneyBuddha
@HoneyBuddha I disagree whether my approach is inefficient. It depends on the circumstances imo. For this simple usecase, you might be right. However, when working with large datasets it is (in my experience) more efficient to summarise first and then plot. Also when the summarisation is bit more complex than a straightforward percentage, it is better to summarise first and then plot.Jaap

2 Answers

34
votes

First summarise and transform your data:

library(dplyr)
d2 <- d %>% 
  group_by(groupchange, Symscore3) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count))

Then you can plot it:

ggplot(d2, aes(x = factor(groupchange), y = perc*100, fill = factor(Symscore3))) +
  geom_bar(stat="identity", width = 0.7) +
  labs(x = "Groupchange", y = "percent", fill = "Symscore") +
  theme_minimal(base_size = 14)

this gives:

enter image description here


Alternatively, you can use the percent function from the scales package:

brks <- c(0, 0.25, 0.5, 0.75, 1)

ggplot(d2, aes(x = factor(groupchange), y = perc, fill = factor(Symscore3))) +
  geom_bar(stat="identity", width = 0.7) +
  scale_y_continuous(breaks = brks, labels = scales::percent(brks)) +
  labs(x = "Groupchange", y = NULL, fill = "Symscore") +
  theme_minimal(base_size = 14)

which gives:

enter image description here

27
votes

If your goal is visualization in minimal code, use position = "fill" as an argument in geom_bar().

If you want within group percentages, @Jaap's dplyr answer answer is the way to go.

Here is a reproducible example using the above dataset to copy/paste:

library(tidyverse)

d <- data_frame(groupchange = c(4,4,4,4,5,5,5,4,2,5,5,5,5,4,5,1,4,1,5,4),
                Symscore3 = c(1,2,1,2,0,0,0,0,2,0,0,1,0,1,1,0,0,1,1,0))

ggplot(d, aes(x = factor(groupchange), fill = factor(Symscore3))) +
  geom_bar(position="fill")

enter image description here