Convert "select all that apply" to binary choices

Question

I have a data frame of survey responses, and some of the columns are questions where participants can select multiple answers ("select all that apply").

> age <- c(24, 28, 44, 55, 53)
> ethnicity <- c("ngoni", "bemba", "lozi tonga", "bemba tonga other", "bemba tongi")
> ethnicity_other <- c(NA, NA, "luvale", NA, NA) 
> df <- data.frame(age, ethnicity, ethnicity_other)

I would like those questions to be set up as binary-response items, so that each of the response choices (in this case ethnicity and ethnicity_other) becomes a column vector with either a 0 or a 1.

So far, I wrote a script that separates the individual unique responses into a list (z):

> x <- unique(as.vector(unlist(strsplit(as.character(df$ethnicity_other), " ")),    mode="list"))
> y <- unique(as.vector(unlist(strsplit(as.character(df$ethnicity), " ")), mode="list"))
>
> combine <- c(x, y)
>
> z <- NULL
> for(i in combine){
> if(!is.na(i)){
> z <- append(z, i)
>   }   
> }

I then created new columns from that list and filled them with NA values.

> for(elm in z){
>   df[paste0("ethnicity_",elm)]  <- NA
> }

So now I have 35 additional columns that I would like to fill with ones and zeros, depending on whether that column name (or part of that column name, as I prefix it with ethnicity_) can be found in the corresponding cell under ethnicity or ethnicity_other. I tried taking a stab at it a number of ways with no good solution.

Jake Burkhead Jake Burkhead · Accepted Answer · 2014-02-20T00:11:25

Here's a couple ways to do this with plyr or data.table.

all_ethnicities <- unique(c(
    unlist(strsplit(df$ethnicity, " ")),
    unlist(strsplit(df$ethnicity_other, " "))
    ))

df$id <- 1:nrow(df)

library(plyr)

ddply(df, .(id), function(x)
      table(factor(unlist(strsplit(paste(x$ethnicity, x$ethnicity_other), " ")),
                   levels = all_ethnicities)))

##    id ngoni bemba lozi tonga other tongi luvale
## 1  1     1     0    0     0     0     0      0
## 2  2     0     1    0     0     0     0      0
## 3  3     0     0    1     1     0     0      1
## 4  4     0     1    0     1     1     0      0
## 5  5     0     1    0     0     0     1      0

library(data.table)

DT <- data.table(df)

DT[, {
    as.list(
        table(
            factor(
                unlist(strsplit(paste(ethnicity, ethnicity_other),  " ")),
                levels = all_ethnicities)
            ),
        )
}, by = id]

##     id ngoni bemba lozi tonga other tongi luvale
## 1:  1     1     0    0     0     0     0      0
## 2:  2     0     1    0     0     0     0      0
## 3:  3     0     0    1     1     0     0      1
## 4:  4     0     1    0     1     1     0      0
## 5:  5     0     1    0     0     0     1      0

Convert "select all that apply" to binary choices

3 Answers