1
votes

This is my very first question on stackoverflow. I have a question about creating a bar plot with three categorical variables using R. I am only using R for three weeks, so I hoped you could help me with this problem.

I have a dataframe that summarizes the number of females and males in two places (place1 and place2) per age group. I am interested in the proportions of males and females in both places and per age group for comparison. The data looks as follows:

# Females
data_female <- data.frame(agegroup = c("0-4","5-14","15-24","25-44","45-64","65-74","75-120"),
                          number_place1 = c(7000, 12000, 15000,40000, 36000, 10000, 13000),
                          number_place2 = c(163000, 360000, 350000,800000, 900000, 360000, 370000))
# Extra columns
data_female <- data_female %>%
               mutate(percentage_place1 = number / sum(number) * 100,
                      percentage_place2 = number / sum(number) * 100,
                      gender = "F") %>%
               select(agegroup, percentage_place1, percentage_place2, gender)

# Males
data_male <- data.frame(agegroup = c("0-4","5-14","15-24","25-44","45-64","65-74","75-120"),
                          number_place1 = c(6000, 13000, 13000,38000, 37000, 9000, 12000),
                          number_place2 = c(161000, 340000, 320000,699000, 900230, 330600, 385000))
# Extra columns
data_male <- data_male %>%
               mutate(percentage_place1 = number / sum(number) * 100,
                      percentage_place2 = number / sum(number) * 100,
                      gender = "M") %>%
               select(agegroup, percentage_place1, percentage_place2, gender)

Both dataframes are then combined into one and 'pivot_longer' is used to create a 'long' dataframe:

data <- rbind(data_females, data_males)

data_long <- data %>%
              rename(place1 = percentage_place1, place2 = percentage_place2) %>%
              pivot_longer(cols = c("place1","place2"),names_to = "place", values_to = "percentage")

In the end I have a dataframe with following columns:

  • agegroup
  • gender (M/F)
  • place (place1 / pace2)
  • percentage (proportion of number of males/females per place and per age group)

From this dataframe, I want to create a graph that looks exactly like the figure that can be found here:

enter image description here

It is a bar graph with:

  • x-axis: place and gender nested inside age group. E.g. within one age group; males have a lighter color (left two bars) and females have darker color (right two bars); Within each gender class, we have two bars: place1 = green and place2 = blue.
  • y-axis: percentage (proportion)

For now, I have a figure with code like this:

ggplot(data_long, aes(x= agegroup, y=percentage, fill=interaction(place,sex))) +   
  geom_bar(position='dodge', stat='identity') +
  facet_wrap( ~ name)

This figure has two larger columns, "place1" and "place2" (because of face_wrap()), but I want to combine them into one column graph as the example figure. Plus, how can I create this nice table underneath the bar graph as in the example?

I hope it is clear what I mean. Is there someone who has experience with creating such figures?

1

1 Answers

2
votes

You can use the "sneaky facets" approach.

First ensure that your categorical variables are in the desired order:

agelevels <- c("0-4", "5-14", "15-24", "25-44", "45-64", "65-74", "75-120")
data_long <- data_long %>% mutate(agegroup = factor(agegroup, agelevels),
                                  gender = factor(gender, c("M", "F")))

Then we plot with gender on the x axis, and fill according to the interaction between sex and place. We then facet by age group along the x axis, removing spacing between the panels and each panel's border. Finally we switch the facet strip position to the bottom (on the outside) and remove its background to make it look like a secondary x axis:

ggplot(data_long, aes(x = gender, y = percentage, 
                      fill = interaction(place, gender))) +   
  geom_col(position = 'dodge', color = "gray50") +
  facet_grid( ~ agegroup, switch = "x") +
  scale_fill_manual(values = c("#a8d094", "#9fc0e7", "#97a891", "#95a5c2"),
                    labels = c("Male, place 1", "Male, place 2",
                               "Female, place 1", "Female, place 2")) +
  labs(fill = "", x = "Age group") +
  theme_bw() +
  theme(panel.spacing = unit(0, "points"),
        panel.border = element_blank(),
        axis.line = element_line(),
        strip.placement = "outside",
        strip.background = element_blank(),
        legend.position = "bottom",
        panel.grid.major.x = element_blank())

enter image description here