
I am trying to plot two factor variables and label the results with % inside the plots.

I already checked this post and the links he/she provides :

How to center stacked percent barchart labels

The ggplot line you are seing here is actually from one of the posts recommended :

sex <- c("F","F","M", "M", "M", "F","M","F","F", "M", "M", "M", "M","F","F", "M", "M", "F")
behavior <- c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "B", "C", "A")

BehSex <- data.frame(sex, behavior)

ggplot(BehSex, aes(x= factor(sex), fill= factor(behavior), y = (..count..)/sum(..count..)))+
  geom_bar() +
  stat_bin(geom = "text",
          aes(label = paste(round((..count..)/sum(..count..)*100), "%")),
          vjust = 5)

However, when I use that line I get the following error :

Error: StatBin requires a continuous x variable: the x variable is discrete. Perhaps you want stat="count"?

I tried using stat="count" inside the geom_bar() but it doesn't seem to work as expected.

Three questions:
1) What am I doing wrong?
2) How can I manage to plot what I want?
3) How can I plot: the % and then in another graph the counts?

Here's the plot that I have right now

Thank you in advance for your help!


3 Answers


Regarding the answer of your post you mentioned, you will have to display the percentage using position = position_stack().

Also, you can use dplyr package to get percentage from your dataframe. To my opinion, it makes easier then to display the labeling:

df <- BehSex %>% group_by(sex) %>% count(behavior) %>% mutate(Percent = n / sum(n)*100)

# A tibble: 6 x 4
# Groups:   sex [2]
  sex   behavior     n Percent
  <fct> <fct>    <int>   <dbl>
1 F     A            2    25  
2 F     B            3    37.5
3 F     C            3    37.5
4 M     A            4    40  
5 M     B            3    30  
6 M     C            3    30  

Then, you can get your plot like this:

ggplot(df, aes(x = sex, y = Percent, fill = behavior))+
  geom_bar(stat = "identity")+
  geom_text(aes(label = paste(Percent,"%"), y = Percent), 
            position = position_stack(vjust = 0.5))+
  labs(x = "Sex", y = "Percentage",fill = "Behavior")

Here's another approach using a bit of data prep with dplyr:

EDIT: added counts. To show one or the other just change the label.

BehSexSum <- BehSex %>%
  count(sex, behavior) %>%
  mutate(pct = n / sum(n),
         pct_label = scales::percent(pct))

ggplot(BehSexSum, aes(x= sex, fill = behavior, y = pct)) +
  geom_col() +
  geom_text(aes(label = paste(pct_label, n, sep = "\n")), 
                lineheight = 0.8,
                position = position_stack(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent)

I think an easier approach to format the y-axis labels as percentage is using scale_y_continuous(labels = scales::percent), instead of using stat_bin(...). Therefore, the code can stay almost the same.

ggplot(BehSex, aes(x= factor(sex), fill= factor(behavior), y =(..count..)/sum(..count..)))+
  geom_bar() +
  #Set the y axis format as percentage
  scale_y_continuous(labels = scales::percent)+
  #Change the legend and axes names 
  labs(x = "Sex", y = "Percentage",fill = "Behavior")