0
votes

I am trying to do a stacked bar plot based on count, but with the labels showing the percentage on the plot. I have produced the plot below. However the percentage is based on all of the data. What I am after is the percentage by team (such that the sum of the percentages for Australia = 100% and the percentages for England = 100%). Team plot

The code for achieving this is the following function. This function counts the number of different roles in each team across 5 matches (I have had to divide the result by 10 as a players role appears twice for each match (5 matches x 2 appearances):

team_roles_Q51 <- function(){
        ashes_df <- tidy_data()
        
        graph <- ggplot(ashes_df %>%
                        count(team, role) %>%       #Groups by team and role
                        mutate(pct=n/sum(n)),       #Calculates % for each role
               aes(team, n, fill=role)) +
                geom_bar(stat="identity") +
                scale_y_continuous(labels=function(x)x/10) +      #Needs to be a better way than dividing by 10
                ylab("Number of Participants") +
                geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%")),
                          position=position_stack(vjust=0.5)) +
                ggtitle("England & Australia Team Make Up") +
                theme_bw()

        print(graph)
}

An example of the dataframe that is imported is:

Data frame imported

Structure for the first 10 rows of the dataframe as follows:

structure(list(batter = c("Ali", "Anderson", "Bairstow", "Ball", 
"Bancroft", "Bird", "Broad", "Cook", "Crane", "Cummins"), team = structure(c(2L, 
2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L), .Label = c("Australia", 
"England"), class = "factor"), role = structure(c(1L, 3L, 4L, 
3L, 2L, 3L, 3L, 2L, 3L, 3L), .Label = c("allrounder", "batsman", 
"bowler", "wicketkeeper"), class = "factor"), innings = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("test_1_innings_1", 
"test_1_innings_2", "test_2_innings_1", "test_2_innings_2", "test_3_innings_1", 
"test_3_innings_2", "test_4_innings_1", "test_4_innings_2", "test_5_innings_1", 
"test_5_innings_2"), class = "factor"), batting_num = c(6, 11, 
7, 10, 1, NA, 9, 1, NA, 9), score = c(38, 5, 9, 14, 5, NA, 20, 
2, NA, 42), balls_faced = c(102, 9, 24, 11, 19, NA, 32, 10, NA, 
120)), row.names = c(NA, 10L), class = "data.frame")

Any help would be appreciated. Thanks

1
Sounds like you need to add group_by(team) before your mutate(): ashes_df %>% count(team, role) %>% group_by(team) %>% mutate(pct=n/sum(n)). This doesn't really have anything to with ggplot2` in this case.MrFlick

1 Answers

1
votes

You need to group_by team to calculate the proportion and use pct in aes :

library(dplyr)
library(ggplot2)

ashes_df %>%
  count(team, role) %>%       
  group_by(team) %>%
  mutate(pct= prop.table(n) * 100) %>%
  ggplot() + aes(team, pct, fill=role) +
  geom_bar(stat="identity") +
  ylab("Number of Participants") +
  geom_text(aes(label=paste0(sprintf("%1.1f", pct),"%")),
            position=position_stack(vjust=0.5)) +
  ggtitle("England & Australia Team Make Up") +
  theme_bw()

enter image description here