0
votes

How can I add the percent of each category to a stacked bar chart of the axis and not the fill. For example, I have the following dataset:

df<-structure(list(age_group = structure(c(3L, 3L, 5L, 3L, 5L, 5L, 
5L, 3L, 5L, 5L, 4L, 4L, 4L, 3L, 5L), .Label = c("65+", "55-64", 
"45-54", "35-44", "25-34", "18-24"), class = "factor"), Gender = c("F", 
"M", "M", "M", "F", "M", "M", "M", "F", "M", "M", "F", "M", "F", 
"M")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-15L), .Names = c("age_group", "Gender"))

dat <- aggregate(list(value = 1:NROW(df)), df[c("age_group", "Gender")], length)
dat$proportion <- ave(dat$value, dat$age_group, FUN = function(x) (x/sum(x)*100))
dat$proportionR <- round(dat$proportion, digits =0)

dat<-dat %>%
  group_by(age_group) %>%
  mutate(age_per = sum(value)) %>%
  ungroup() %>%
  mutate(age_per = round((age_per/sum(value))*100))

ggplot(dat, aes(x = age_group, y = value, fill = Gender)) +
  geom_col() + coord_flip() + ylab("Visits 2018-2019") +xlab("") +
  scale_fill_manual(values= c("#740404", "#AB6868", "#D5B3B3"), labels = c("Females", "Males", "N/A")) +
  theme(legend.title=element_blank()) +
  geom_text(aes(label = paste0(age_per, "%")), hjust = 2.7, position = "stack", color = "white", size =5)

enter image description here

What I would like is an automated way to add the total percent for each group from the y-axis while disregarding the percentages within each group. My work flow identifies the correct percent but replicates it over each subgroup within the stack. I would like the geom_text to be placed in the white space right after bar ends.

Just as a note, the question is not a duplicate of the following SO Q -Adding percentage labels to a bar chart in ggplot2 -because this question deals with percents when there are stacked groups within each bar (the former is just for bar plots).

Also, emphasis on automated. I can do the following but in my real data set I have many more age group intervals, which makes the below approach untenable.

ggplot(dat, aes(x = age_group, y = value, fill = Gender)) +
  geom_col() + coord_flip() + ylab("Visits 2018-2019") +xlab("") +
  scale_fill_manual(values= c("#740404", "#AB6868", "#D5B3B3"), labels = c("Females", "Males", "N/A")) +
  theme(legend.title=element_blank()) +
  geom_text(aes(y= 5.2, x=1, label = "33%"), color = "#740404", size =5) +
  geom_text(aes(y= 3.2, x=2, label = "20%"), color = "#740404", size =5) +
  geom_text(aes(y= 7.2, x=3, label = "47%"), color = "#740404", size =5) 

enter image description here

1
I'm a little bit confused, what are the exact numbers you want to add?pogibas
@PoGibas I would like just a single number following the bar, that number should be the same as the repeated numbers you currently see above in the first chart. The percents are calculated as the total for each age group (disregarding gender) divided by the total number of values in the dataset. See additions above.Cyrus Mohammadian

1 Answers

1
votes

Consider annotating using a grouping percent calculation. Since you need to add three numbers with a series of six, annotate can diverge from grouping series. Also, use the appropriate gender and age group percentages. And below another base::ave call replaces your dplyr::group_by:

agg_df <- aggregate(list(value = 1:NROW(df)), df[c("age_group", "Gender")], length)

dat <- within(agg_df, {
  proportion <- ave(value, age_group, FUN = function(x) (x/sum(x)*100))
  proportionR <- round(proportion, digits=0)

  age_per <- round((ave(value, age_group, Gender, FUN=sum) / sum(value)) * 100)      
  grp_pct <- round((ave(value, age_group, FUN=sum) / sum(value)) * 100)
})

dat
#   age_group Gender value grp_pct age_per proportionR proportion
# 1     45-54      F     2      33      13          40   40.00000
# 2     35-44      F     1      20       7          33   33.33333
# 3     25-34      F     2      47      13          29   28.57143
# 4     45-54      M     3      33      20          60   60.00000
# 5     35-44      M     2      20      13          67   66.66667
# 6     25-34      M     5      47      33          71   71.42857



ggplot(dat, aes(x = age_group, y = value, fill = Gender)) +
  geom_col() + coord_flip() + ylab("Visits 2018-2019") +xlab("") +
  scale_fill_manual(values= c("#740404", "#AB6868", "#D5B3B3"), 
                    labels = c("Females", "Males", "N/A")) +
  theme(legend.title=element_blank()) +
  geom_text(aes(label = paste0(age_per, "%")), hjust = 2.7, 
            position = "stack", color = "white", size =5) + 
  annotate("text", x=1, y=5.25, label = paste0(dat$grp_pct[[1]], "%")) +
  annotate("text", x=2, y=3.25, label = paste0(dat$grp_pct[[2]], "%")) +
  annotate("text", x=3, y=7.25, label = paste0(dat$grp_pct[[3]], "%"))

Plot Output


For dynamic annotating, you may have to use the functional form of ggplot using Reduce where the + (not actually the plus arithmetic operator) is exposed as +.gg() operator. Then, call mapply to iterate through unique(grp_pct) to pass in x coordinate location and annotate label. Remaining challenge is that the best y coordinate is unknown.

Reduce(ggplot2:::`+.gg`, 

       c(list(ggplot(dat, aes(x = age_group, y = value, fill = Gender)),
              geom_col(), coord_flip(), ylab("Visits 2018-2019"), xlab(""),
              scale_fill_manual(values= c("#740404", "#AB6868", "#D5B3B3"),
                              labels = c("Females", "Males", "N/A")),
              theme(legend.title=element_blank()),
              geom_text(aes(label = paste0(age_per, "%")), hjust = 2.7, 
                        position = "stack", color = "white", size =5) 
         ),
         Map(function(x_loc, g_lab) annotate("text", x=x_loc, y=7.25,
                                                label = paste0(g_lab, "%")),
             seq(length(unique(dat$grp_pct))), unique(dat$grp_pct)
         )
       )
)

Plot Output