1
votes

I am hoping to use ggplot to construct a barplot of frequencies (or just % 1s) of a bunch of binary variables, and am having trouble getting them all together on one plot.

The variables all stem from the same question in a survey, so ideally it'd be nice to have data that is tidy with one column for this variable, but respondents could select more than one option and I'm hoping to retain that instead of having a "more than one selected" option. Here is a slice of the data:

structure(list(gender = structure(c("Male", "Male", "Female", 
"Female", "Female", "Female", "Male", "Male", "Male", "Male"), label = "Q4", format.stata = "%24s"), 
    var1 = structure(c("0", "0", "1", "1", "0", "0", "0", "0", 
    "0", "0"), format.stata = "%9s"), var2 = structure(c("0", 
    "98", "1", "0", "0", "0", "0", "0", "0", "0"), format.stata = "%9s"), 
    var3 = structure(c("0", "0", "0", "0", "0", "0", "0", "0", 
    "0", "0"), format.stata = "%9s"), var4 = structure(c("1", 
    "0", "1", "0", "0", "0", "1", "1", "0", "0"), format.stata = "%9s"), 
    var5 = structure(c("1", "0", "0", "0", "0", "1", "0", "0", 
    "0", "0"), format.stata = "%9s")), row.names = c(NA, -10L
), class = c("tbl_df", "tbl", "data.frame"))
1
Can you clarify what you are looking to construct based on the data provided? It's not clear to me how you want to organize your intended chart, given the example data. Also.. is there supposed to be a 98 in there or is it supposed to be all 1's and 0's? - chemdork123
@chemdork123 Sorry, the 98 represents a missing value. And the ideal chart would have var1, var2, var3, etc. along the x axis and with a frequency or percentage of 1s along the y for each respective var. - todd_b

1 Answers

2
votes

Get the data in long format so that it is easier to plot.

library(tidyverse)

df %>%
  pivot_longer(cols = starts_with('var')) %>%
  group_by(name) %>%
  summarise(frequency_of_1 = sum(value == 1)) %>%
  #If you need percentage use mean instead of sum
  #summarise(frequency_of_1 = mean(value == 1)) %>%
  ggplot() + aes(name, frequency_of_1) + geom_col()

enter image description here


In base R you can do this with colSums and barplot.

barplot(colSums(df[-1] == 1))
#For percentage
#barplot(colMeans(df[-1] == 1))