8
votes

I'm looking for a way to label a stacked bar chart with percentages while the y-axis shows the original count (using ggplot). Here is a MWE for the plot without labels:

library(ggplot2)
df <- as.data.frame(matrix(nrow = 7, ncol= 3,
                       data = c("ID1", "ID2", "ID3", "ID4", "ID5", "ID6", "ID7",
                                "north", "north", "north", "north", "south", "south", "south",
                                "A", "B", "B", "C", "A", "A", "C"),
                      byrow = FALSE))

colnames(df) <- c("ID", "region", "species")

p <- ggplot(df, aes(x = region, fill = species))
p  + geom_bar()

I have a much larger table and R counts quite nicely the different species for every region. Now, I would like to show both, the original count value (preferably on the y-axis) and the percentage (as label) to compare proportions of species between regions.

I tried out many things using geom_text() but I think the main difference to other questions (e.g. this one) is that

  • I do not have a separate column for y values (they are just the counts of different species per region) and
  • I need the labels per region to sum up to 100% (since they are considered to represent seperate populations), not all labels of the entire plot.

Any help is much appreciated!!

2
When you're doing something non-standard you usually need to compute the numbers yourself. It might be possible to do this inside ggplot, but it won't be straightforward. Better to use functions built for data manipulation then trying to do data manipulation within ggplot.Gregor Thomas

2 Answers

14
votes

As @Gregor mentioned, summarize the data separately and then feed the data summary to ggplot. In the code below, we use dplyr to create the summary on the fly:

library(dplyr)

ggplot(df %>% count(region, species) %>%    # Group by region and species, then count number in each group
         mutate(pct=n/sum(n),               # Calculate percent within each region
                ypos = cumsum(n) - 0.5*n),  # Calculate label positions
       aes(region, n, fill=species)) +
  geom_bar(stat="identity") +
  geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%"), y=ypos))

enter image description here

Update: With dplyr 0.5 and later, you no longer need to provide a y-value to center the text within each bar. Instead you can use position_stack(vjust=0.5):

ggplot(df %>% count(region, species) %>%    # Group by region and species, then count number in each group
         mutate(pct=n/sum(n)),              # Calculate percent within each region
       aes(region, n, fill=species)) +
  geom_bar(stat="identity") +
  geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%")), 
            position=position_stack(vjust=0.5))
1
votes

I agree with Johanna. You could try:

d <- aggregate(.~region+species, df, length)
d$percent <- paste(round(ID/sum(ID)*100),'%',sep='')
ggplot(d, aes(region, ID, fill=species)) + geom_bar(stat='identity') + 
  geom_text(position='stack', aes(label=paste(round(ID/sum(ID)*100),'%',sep='')), vjust=5)