I am trying to plot a histogram with an overall count of each bin on the top. Following is my data:
You can use the following sample data:
histData <- data.frame("UserId" = 1:20, "age" = c(replicate(20,sample(10:20,20,rep=TRUE))), "Gender" = c("Male", "Female"))
I am using ggplot as shown below:
ggplot(histData, aes(x = age, color = Gender, fill = Gender)) +
geom_histogram(binwidth = 1,
alpha = 0.2,
position = "identity", aes(y = 100*(..count..)/sum(..count..))) +
scale_color_manual(values = rainbow(3)) +
geom_vline(
aes(xintercept = mean(age)),
color = "black",
linetype = "dashed",
size = 1
) +
labs(title = "Age histogram plot", x = "Age", y = "Percentage") +
theme_minimal() + theme(plot.title = element_text(hjust = 0.5))+
stat_bin(aes(y=round(100*(..count..)/sum(..count..),1), label=round(100*(..count..)/sum(..count..),1)), geom="text", vjust=0, binwidth = 1)
which results in the plot as shown below:
In the plot, count for each gender is displayed separately, on the top of their respective bins. However, I do not want gender specific count, I just want the overall count on top of the bin stacks (i.e. I just want the red numbers which says the overall count). How do I achieve that while having aes(x = age, color = Gender, fill = Gender)
aesthetics in my ggplot2 for classes of gender?
EDIT: Based on the answer below, tried the following
ageGroupCount <- histData[, -1]
ageGroupCount$age <- as.integer(df$age)
ageGroupCount$Gender <- as.factor(df$Gender)
ageGroupCount <-
ageGroupCount %>% group_by(age, Gender) %>% count()
ageCount <- histData[2] %>% count()
ageGroupCount %>%
ggplot(aes(x = age, y = freq, label = freq)) +
geom_col(aes(fill = Gender, color = Gender), alpha = 0.65) +
scale_y_continuous(labels = percent) +
geom_text(
data = ageCount,
size = 3,
position = position_dodge(width = 1),
vjust = -0.5
) + geom_vline(
aes(xintercept = mean(age)),
color = "black",
linetype = "dashed",
size = 1
) + scale_color_manual(values = rainbow(3)) +
labs(title = "Age histogram plot", x = "Age", y = "Percentage") +
theme_minimal() + theme(plot.title = element_text(hjust = 0.5))
which resulted in the following plot: How do I get rid of the trailing zeros in the scale, and how do I put up the percent values on the top of each bar, instead of the absolute numbers?
ANSWER: I was able to do it using the code below
ageGroupCount <- histData[, -1]
ageGroupCount$age <- as.integer(ageGroupCount$age)
ageGroupCount$Gender <- as.factor(ageGroupCount$Gender)
ageGroupCount <-
ageGroupCount %>% group_by(age, Gender) %>% count()
ageGroupCount <- mutate(ageGroupCount, freq = round(100*freq / sum(freq),1))
ageCount <- histData[2] %>% count()
ageCount$age <- as.integer(ageCount$age)
ageCount <- mutate(ageCount, freq = round(100*freq / sum(freq),1))
ageGroupCount %>%
ggplot(aes(x = age, y = freq, label = freq)) +
geom_col(aes(fill = Gender, color = Gender), alpha = 0.65) +
geom_text(
data = ageCount,
size = 3,
position = position_dodge(width = 1),
vjust = -0.5
) + geom_vline(
aes(xintercept = mean(age)),
color = "black",
linetype = "dashed",
size = 1
) + scale_color_manual(values = rainbow(3)) +
scale_y_continuous(labels = function(x) paste0(x, "%"))+
labs(title = "Age histogram plot", x = "Age", y = "Percentage") +
theme_minimal() + theme(plot.title = element_text(hjust = 0.5))