I have a data frame of survey responses, and some of the columns are questions where participants can select multiple answers ("select all that apply").
> age <- c(24, 28, 44, 55, 53)
> ethnicity <- c("ngoni", "bemba", "lozi tonga", "bemba tonga other", "bemba tongi")
> ethnicity_other <- c(NA, NA, "luvale", NA, NA)
> df <- data.frame(age, ethnicity, ethnicity_other)
I would like those questions to be set up as binary-response items, so that each of the response choices (in this case ethnicity and ethnicity_other) becomes a column vector with either a 0 or a 1.
So far, I wrote a script that separates the individual unique responses into a list (z):
> x <- unique(as.vector(unlist(strsplit(as.character(df$ethnicity_other), " ")), mode="list"))
> y <- unique(as.vector(unlist(strsplit(as.character(df$ethnicity), " ")), mode="list"))
>
> combine <- c(x, y)
>
> z <- NULL
> for(i in combine){
> if(!is.na(i)){
> z <- append(z, i)
> }
> }
I then created new columns from that list and filled them with NA values.
> for(elm in z){
> df[paste0("ethnicity_",elm)] <- NA
> }
So now I have 35 additional columns that I would like to fill with ones and zeros, depending on whether that column name (or part of that column name, as I prefix it with ethnicity_) can be found in the corresponding cell under ethnicity or ethnicity_other. I tried taking a stab at it a number of ways with no good solution.