0
votes

I would like to create a stacked bar graph with ggplot where the heights of the bars depend on the values of one variable (voter turnout in %) and the stacks of the bars individually add up to 100% of another variable (voteshare in %). So for the year 1990 there was a voter turnout of 96.7 and the bar should be filled with the individual voteshares of each party, which add up to 100% (of the 96.7%). I look at the data of 3 parties and 3 years.

Here is my data:

party <- c("a", "b", "c", "a", "b", "c", "a", "b", "c") 
year <- c(1990, 1990, 1990, 1991, 1991, 1991, 1992,1992, 1992)
voteshare <- c(0,33.5, 66.5, 40.5, 39.0, 20.5, 33.6, 33.4, 33)
turnout = c(96.7,96.7,96.7, 85.05,85.05,85.05, 76.41, 76.41, 76.41)
df<- data.frame(parties, year, voteshare, turnout)

In addition, I would like to put the numbers of the individual voteshares and the total turnout inside the graph.

My approach so far:

ggplot(df, aes(x=year, y=interaction(turnout, voteshare), fill=party)) + 
    geom_bar(stat="identity", position=position_stack()) +
    geom_text(aes(label=Voteshare), vjust=0.5)

It's a mess.

Thanks a ton in advance!

1
Can you add a little more clarity to "it's a mess"? What is it doing or not doing? How is this different from your expectation? Anything you can add would help folks chase the right issue(s).Jesse Q
There are a few problems with your dataframe: party is only 3 items long. I'm guessing you want each party repeated for each year, such as rep(c("a", "b", "c"), times = 3). year is also too short; I think you meant to have a third 1992 in that vector. Also keep in mind that ggplot expects long shaped data. How do you intend for the interaction of these two variables to be displayed?camille
Thank you very much for your useful comments! I intended to give an example of my data here, however, made a mistake as you already noticed (-> year, party too short). I just edited it, so it should be replicable now.Ruebenkraut

1 Answers

1
votes

I used a dplyr pipeline to:

  • create a column for adjusted vote total which is the product of each party's share and total turnout.
  • get rid of the zero rows so no zeros appear on the final output
  • calculate the y value where the vote total should be displayed by taking the cumsum() of vote share by party, grouped by year. I had to use rev() because the default of position_stack() is to put the low number in alphabetical order at the top of the stack.

Code

library(dplyr)
library(ggplot2)

df <- df %>%
  mutate(adj_vote = turnout * voteshare / 100) %>%
  filter(adj_vote > 0) %>%
  group_by(year) %>% 
  mutate(cum_vote = cumsum(rev(adj_vote)),
         vote_label = rev(voteshare))


ggplot(df, aes(x=year, y=adj_vote, fill=party)) + 
  geom_bar(stat="identity", position=position_stack()) +
  geom_text(aes(label=vote_label, y = cum_vote), vjust=0.5)

Output

ggplot2 output