I have a 'check all that apply' item from a survey I would like to process. The data come from a string variable in which each choice a respondent makes is encoded into the same variable. Respondents may choose from a list of 21 options, all that apply to them. I would like to create a set of 21 dummy variables indicating yes/no whether or not a respondent selected a particular option.
Three example responses are:
id x
1 3, 13
2 1, 3, 8, 9, 11, 13
3 1, 9
...
And what I would like is:
id x x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13
1 3, 13 0 0 1 0 0 0 0 0 0 0 0 0 1
2 1, 3, 8, 9, 11, 13 1 0 1 0 0 0 0 1 1 0 1 0 1
3 1, 9 1 0 0 0 0 0 0 0 1 0 0 0 0
...
In my attempt to do this, I've read an id variable and the response variable
into a list jp such that each respondent has an id in jp[[1]] and his/her
response in jp[[2]]:
> jp[[2]][1:3]
[1] "3, 13 "
[2] "1, 3, 8, 9, 11, 13 "
[3] "1, 9 "
I then cleaned them up via strsplit on the commas and put that in jp[[4]]:
> jp[[4]][1:3]
[[1]]
[1] "3" "13"
[[2]]
[1] "1" "3" "8" "9" "11" "13"
[[3]]
[1] "1" "9"
I found the unique values across all list elements:
> taught <- as.character(sort(as.numeric(unique(unlist(jp[[4]])))))
> taught
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "256"
Through a little trial and error, I figured out I could process each respondents' choices as follows:
sapply(jp[[4]], function(x) any(x == "1"))
And this appears to work ok:
> table(sapply(jp[[4]], function(x) any(x == "1")))
FALSE TRUE
9404 1891
This is the prevalence I expect.
However, because each respondent can have from 0-21 responses (sublist elements), I figured I needed to loop through each unique response in each respondents' sublist, writing out the results to a new list element.
I'm hoping to take the list element jp[[4]], where the cleaned up responses are
and loop through each element of 'taught' to see if exists in each respondents
sublist.
bla <- function(dt, lst) {
for (i in 1:length(lst)) {
subs <- list()
# apply function on each part, by row
subs[[i]] <- sapply(dt, function(x) any(x == taught[i]))
}
return(subs)
}
bla(jp[[4]], taught)
Unfortunately, it only appears to work for the last (the 21st, or '256') element in 'taught', and does not save to my list 'subs' I defined in the function.
> table(bla(jp[[4]], taught)[21])
FALSE TRUE
10645 650
> table(sapply(jp[[4]], function(x) any(x == "256")))
FALSE TRUE
10645 650
Suggestions welcome. Thanks.