The dataset
gender <- c('Male', 'Male', 'Male', 'Female', 'Female', 'Female', 'Male', 'Male', 'Male', 'Female', 'Female', 'Female', 'Female', 'Female', 'Male', 'Female', 'Female', 'Male', 'Female', 'Female')
answer <- c('Yes', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes')
df <- data.frame(gender, answer)
is biased towards females:
df %>% ggplot(aes(gender, fill = gender)) + geom_bar()
My task is to build a graph that makes it easy to figure out which of the two genders is more likely to say 'Yes'
.
But, given the bias, I cannot just do
df %>% ggplot(aes(x = answer, fill = gender)) + geom_bar(position = 'dodge')
or even
df %>% ggplot(aes(x = answer, y = ..count../sum(..count..), fill = gender)) +
geom_bar(position = 'dodge')
To alleviate the bias I need to divide each of the counts by the total number of males or females respectively so that the 'Female'
bars add up to 1
as well as the 'Male'
ones. Like so:
df.total <- df %>% count(gender)
male.total <- (df.total %>% filter(gender == 'Male'))$n
female.total <- (df.total %>% filter(gender == 'Female'))$n
df %>% count(answer, gender) %>%
mutate(freq = n/if_else(gender == 'Male', male.total, female.total)) %>%
ggplot(aes(x = answer, y = freq, fill = gender)) +
geom_bar(stat="identity", position = 'dodge')
Which draws a completely different picture.
Questions:
- Is there a way to simplify the former piece of code using only
dplyr
andggplot2
? - Are there any other libraries that can do the trick better?
- Does the above type of chart have a conventional name?
Thanks.