This is my very first question on stackoverflow. I have a question about creating a bar plot with three categorical variables using R. I am only using R for three weeks, so I hoped you could help me with this problem.
I have a dataframe that summarizes the number of females and males in two places (place1 and place2) per age group. I am interested in the proportions of males and females in both places and per age group for comparison. The data looks as follows:
# Females
data_female <- data.frame(agegroup = c("0-4","5-14","15-24","25-44","45-64","65-74","75-120"),
number_place1 = c(7000, 12000, 15000,40000, 36000, 10000, 13000),
number_place2 = c(163000, 360000, 350000,800000, 900000, 360000, 370000))
# Extra columns
data_female <- data_female %>%
mutate(percentage_place1 = number / sum(number) * 100,
percentage_place2 = number / sum(number) * 100,
gender = "F") %>%
select(agegroup, percentage_place1, percentage_place2, gender)
# Males
data_male <- data.frame(agegroup = c("0-4","5-14","15-24","25-44","45-64","65-74","75-120"),
number_place1 = c(6000, 13000, 13000,38000, 37000, 9000, 12000),
number_place2 = c(161000, 340000, 320000,699000, 900230, 330600, 385000))
# Extra columns
data_male <- data_male %>%
mutate(percentage_place1 = number / sum(number) * 100,
percentage_place2 = number / sum(number) * 100,
gender = "M") %>%
select(agegroup, percentage_place1, percentage_place2, gender)
Both dataframes are then combined into one and 'pivot_longer' is used to create a 'long' dataframe:
data <- rbind(data_females, data_males)
data_long <- data %>%
rename(place1 = percentage_place1, place2 = percentage_place2) %>%
pivot_longer(cols = c("place1","place2"),names_to = "place", values_to = "percentage")
In the end I have a dataframe with following columns:
- agegroup
- gender (M/F)
- place (place1 / pace2)
- percentage (proportion of number of males/females per place and per age group)
From this dataframe, I want to create a graph that looks exactly like the figure that can be found here:
It is a bar graph with:
- x-axis: place and gender nested inside age group. E.g. within one age group; males have a lighter color (left two bars) and females have darker color (right two bars); Within each gender class, we have two bars: place1 = green and place2 = blue.
- y-axis: percentage (proportion)
For now, I have a figure with code like this:
ggplot(data_long, aes(x= agegroup, y=percentage, fill=interaction(place,sex))) +
geom_bar(position='dodge', stat='identity') +
facet_wrap( ~ name)
This figure has two larger columns, "place1" and "place2" (because of face_wrap()), but I want to combine them into one column graph as the example figure. Plus, how can I create this nice table underneath the bar graph as in the example?
I hope it is clear what I mean. Is there someone who has experience with creating such figures?