Tallying multiple choice entries in a single column in a R dataframe programmatically

Question

Survey data often contains multiple choice columns with entries separated by commas, for instance:

library("tidyverse")
my_survey <- tibble(
  id = 1:5,
  question.1 = 1:5,
  question.2 = c("Bus", "Bus, Walk, Cycle", "Cycle", "Bus, Cycle", "Walk")
)

It's desirable to have a function multiple_choice_tally that will tally the unique responses for the question:

my_survey %>%
  multiple_choice_tally(question = question.2)
### OUTPUT:
# A tibble: 3 x 2
  response count
     <chr> <int>
1      Bus     3
2     Walk     2
3    Cycle     3

What is the most efficient and flexible way to construct multiple_choice_tally, without any hard coding.

What about table, strsplit, and unlist? table(unlist(strsplit(c("Bus", "Bus, Walk, Cycle", "Cycle", "Bus, Cycle", "Walk"), split="[, ]+"))). — lmo

www www · Accepted Answer · 2017-08-15T16:47:08

We can use separate_rows from the tidyr package to expand the contents in question.2. Since you are using tidyverse, tidyr has been already loaded with library("tidyverse") and we don't have to load it again. my_survey2 is the final output.

my_survey2 <- my_survey %>%
  separate_rows(question.2) %>%
  count(question.2) %>%
  rename(response = question.2, count = n)

my_survey2
# A tibble: 3 × 2
  response count
     <chr> <int>
1      Bus     3
2    Cycle     3
3     Walk     2

Update: Design a Function

We can convert the above code into a function as follows.

multiple_choice_tally <- function(survey.data, question){
  question <- enquo(question)
  survey.data2 <- survey.data %>%
    separate_rows(!!question) %>%
    count(!!question) %>%
    setNames(., c("response", "count"))
  return(survey.data2)
}

my_survey %>%
  multiple_choice_tally(question = question.2)
# A tibble: 3 x 2
  response count
     <chr> <int>
1      Bus     3
2    Cycle     3
3     Walk     2

Tallying multiple choice entries in a single column in a R dataframe programmatically

2 Answers

Update: Design a Function