0
votes

I'm trying to make a graph showing the percentage of men and women in different age groups who have kids under the age of 18. I'd like a graph that has two bars (one for men, one for women) side-by-side for each age group; I'd like the bars two show the percentage who have kids on the bottom, and don't on the top (stacked bars). I cannot figure out how to make such a graph in ggplot2, and would greatly appreciate suggestions.

I calculated my grouped stats using dplyr:

kid18summary <- marsub %>% 
group_by(AgeGroup, sex, kid_under_18) %>% 
summarise(n=n()) %>% 
mutate(freq = n/sum(n))

Which yielded this:

dput(kid18summary)
structure(list(AgeGroup = c("Age<40", "Age<40", "Age<40", "Age<40", 
"Age41-49", "Age41-49", "Age41-49", "Age41-49", "Age50-64", "Age50-64", 
"Age50-64", "Age50-64"), sex = structure(c(1L, 1L, 2L, 2L, 1L, 
1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("Male", "Female"), class = "factor"), 
    kid_under_18 = c("No", "Yes", "No", "Yes", "No", "Yes", "No", 
    "Yes", "No", "Yes", "No", "Yes"), freq = c(0.625, 0.375, 
    0.636833046471601, 0.363166953528399, 0.349557522123894, 
    0.650442477876106, 0.444897959183673, 0.555102040816327, 
    0.724852071005917, 0.275147928994083, 0.819548872180451, 
    0.180451127819549)), .Names = c("AgeGroup", "sex", "kid_under_18", 
"freq"), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -12L), vars = list(AgeGroup, sex), drop = TRUE, indices = list(
    0:1, 2:3, 4:5, 6:7, 8:9, 10:11), group_sizes = c(2L, 2L, 
2L, 2L, 2L, 2L), biggest_group_size = 2L, labels = structure(list(
    AgeGroup = c("Age<40", "Age<40", "Age41-49", "Age41-49", 
    "Age50-64", "Age50-64"), sex = structure(c(1L, 2L, 1L, 2L, 
    1L, 2L), .Label = c("Male", "Female"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L), vars = list(AgeGroup, sex), drop = TRUE, .Names = c("AgeGroup", 
"sex")))

I can plot the percentage of people in each age group and gender who don't have kids under 18:

ggplot(kid18summary, aes(x = factor(AgeGroup), y = freq, fill = factor(sex)), color = factor(sex)) +
  geom_bar(position = "dodge", stat = "identity") + scale_y_continuous(labels = percent)

Or I can make a faceted, stacked bar chart, which is closer to what I'd like, as I'd like to show both both the "yes" and the "no," even though the percentages add up to 100 because I think it's easier to compare colored bars than negative space. The only trouble is that no matter what I do, the "No"s are on the bottom, and the "Yes" on the top, and I'd like it the other way around. (Ideally, I'd really like to have different colors for men and women, say dark blue for men with kids, light blue for men without; dark red for women with kids and light for women without, but I've given up on that for the time being.)

I've tried to change the order of the factors in a variety of ways, all completely unsuccessful.

As suggested in the ggplot2 documentation, I've tried changing the order of the factor levels directly:

kid18summary$kid_under_18 < as.factor(kid18summary$kid_under_18)
o <- c("Yes", "No")  # which I've also changed to ("No", "Yes"), which makes no difference; the order of the Yes and No in the legend changes, but the "Yes" bars stay on top
kid18summary$kid_under_18 <- factor(kid18summary$kid_under_18, levels = o)

kid18summary$kid_under_18 <- factor(kid18summary$kid_under_18, levels(kid18summary$kid_under_18)[c("Yes", "No")]) # changing to [c("No", "Yes")] also only changes the order of the legend

I've tried the answer suggested in another question and added another ordered factor:

kid18summary <- transform(kid18summary, stack.ord = factor(kid_under_18, levels = c("Yes", "No"), ordered = TRUE))
ggplot(kid18summary, aes(x = factor(sex), y = freq, fill = factor(stack.ord)), color = factor(stack.ord)) + geom_bar(stat = "identity") + scale_y_continuous(labels = percent) + facet_wrap(~AgeGroup, nrow=1)

Or just adding another dummy variable:

kid18summary$orderfactor <- "NA"
kid18summary$orderfactor[kid18summary$kid_under_18 == "Yes"] <- 0
kid18summary$orderfactor[kid18summary$kid_under_18 == "No"] <- 1
ggplot(kid18summary, aes(x = factor(sex), y = freq, fill = factor(orderfactor)), color = factor(orderfactor)) + geom_bar(stat = "identity") + scale_y_continuous(labels = percent) + facet_wrap(~AgeGroup, nrow=1)

All of which give me a lot of different ways that I can switch the colors of the yes and no groups in the bars, but not actually which one is on top. Plot1Plot2

1
After setting the level order of your fill factor you need to sort your dataset by that factor. See this answeraosmith
Also, the different colors for different combinations is likely doable. If still interested you might ask a question about that after taking a look at this question/answeraosmith

1 Answers

1
votes

With the answers suggested by aosmith, I ended up with the following, which does exactly what I wanted:

ggplot(arrange(df, kid_under_18), aes(x = factor(sex), y = freq, fill = interaction(sex, factor(kid_under_18))), color = factor(kid_under_18)) + 
geom_bar(stat = "identity") + scale_y_continuous(labels = percent) + 
facet_wrap(~AgeGroup, nrow=1)