0
votes

Survey data often contains multiple choice columns with entries separated by commas, for instance:

library("tidyverse")
my_survey <- tibble(
  id = 1:5,
  question.1 = 1:5,
  question.2 = c("Bus", "Bus, Walk, Cycle", "Cycle", "Bus, Cycle", "Walk")
)

It's desirable to have a function multiple_choice_tally that will tally the unique responses for the question:

my_survey %>%
  multiple_choice_tally(question = question.2)
### OUTPUT:
# A tibble: 3 x 2
  response count
     <chr> <int>
1      Bus     3
2     Walk     2
3    Cycle     3

What is the most efficient and flexible way to construct multiple_choice_tally, without any hard coding.

2
What about table, strsplit, and unlist? table(unlist(strsplit(c("Bus", "Bus, Walk, Cycle", "Cycle", "Bus, Cycle", "Walk"), split="[, ]+"))).lmo

2 Answers

3
votes

We can use separate_rows from the tidyr package to expand the contents in question.2. Since you are using tidyverse, tidyr has been already loaded with library("tidyverse") and we don't have to load it again. my_survey2 is the final output.

my_survey2 <- my_survey %>%
  separate_rows(question.2) %>%
  count(question.2) %>%
  rename(response = question.2, count = n)

my_survey2
# A tibble: 3 × 2
  response count
     <chr> <int>
1      Bus     3
2    Cycle     3
3     Walk     2

Update: Design a Function

We can convert the above code into a function as follows.

multiple_choice_tally <- function(survey.data, question){
  question <- enquo(question)
  survey.data2 <- survey.data %>%
    separate_rows(!!question) %>%
    count(!!question) %>%
    setNames(., c("response", "count"))
  return(survey.data2)
}

my_survey %>%
  multiple_choice_tally(question = question.2)
# A tibble: 3 x 2
  response count
     <chr> <int>
1      Bus     3
2    Cycle     3
3     Walk     2
0
votes

My current solution for this problem is as follows:

multiple_choice_tally <- function(survey.data, question) {
  ## Require a sym for the RHS of !!response := if_else
  question_as_quo <- enquo(question)
  question_as_string <- quo_name(question_as_quo)
  target_question <- rlang::sym(question_as_string)

  ## Collate unique responses to the question
  unique_responses <- survey.data %>%
    select(!!target_question) %>%
    na.omit() %>%
    .[[1]] %>%
    strsplit(",") %>%
    unlist() %>%
    trimws() %>%
    unique()

  ## Extract responses to question
  question_tally <- survey.data %>%
    select(!!target_question) %>%
    na.omit()

  ## Iteratively create a column for each unique response
  invisible(lapply(unique_responses,
                   function(response) {
                     question_tally <<- question_tally %>%
                       mutate(!!response := if_else(str_detect(!!target_question, response), TRUE, FALSE))

                   }))

  ## Gather into tidy form
  question_tally %>%
    summarise_if(is.logical, funs(sum)) %>%
    gather(response, value = count)

}

Which can then be used as follows:

library("tidyverse")
library("rlang")
library("stringr")
my_survey <- tibble(
  id = 1:5,
  question.1 = 1:5,
  question.2 = c("Bus", "Bus, Walk, Cycle", "Cycle", "Bus, Cycle", "Walk")
)

my_survey %>%
  multiple_choice_tally(question = question.2)
### OUTPUT:
# A tibble: 3 x 2
  response count
     <chr> <int>
1      Bus     3
2     Walk     2
3    Cycle     3