I have a data frame that contains the answers of survey respondents to some items (columns: var1, var2, var3, ...). There is also a column indicating which items were seen by the respondent (vars_seen). The data frame looks like this:
df <- data.frame(list(var1=c(-99,1,1,-99,1,1),
var2=c(-99,-99,1,1,-99,1),
var3=c(-99,-99,-99,-99,1,1),
vars_seen=c("var1", "var1", "var1,var2", "var1,var2", "var1,var2,var3", "var1,var2,var3")))
var1 | var2 | var3 | vars_seen |
---|---|---|---|
-99 | -99 | -99 | var1 |
1 | -99 | -99 | var1 |
1 | 1 | -99 | var1,var2 |
-99 | 1 | -99 | var1,var2 |
1 | -99 | 1 | var1,var2,var3 |
1 | 1 | 1 | var1,var2,var3 |
All respondents could have seen all 3 items. But some respondents didn't finish.
Now I want to change the values to -77 for each respondent that could have seen the items but didn't.
The data frame should look like this:
var1 | var2 | var3 | vars_seen |
---|---|---|---|
-99 | -77 | -77 | var1 |
1 | -77 | -77 | var1 |
1 | 1 | -77 | var1,var2 |
-99 | 1 | -77 | var1,var2 |
1 | -99 | 1 | var1,var2,var3 |
1 | 1 | 1 | var1,var2,var3 |
As you can see, there are some -99's left. So I cannot just replace -99 with -77.
What I tried so far:
# create a vector with all items that could have been seen
vars <- c("var1","var2","var3")
# split the comma-separated string to a vector
df$vars_seen.vec <- stringr::str_split(df$vars_seen,",")
# create a function that applies a setdiff on each row
notseen <- function(vars_seen,vars){
setdiff(vars, vars_seen)
}
# apply the function
df$vars_notseen.vec <- sapply(df$vars_seen.vec, notseen, vars)
Now we have a column that contains all items that weren't seen by the respondents. But I don't know how to set the variable based on that.