0
votes

I have a data frame that contains the answers of survey respondents to some items (columns: var1, var2, var3, ...). There is also a column indicating which items were seen by the respondent (vars_seen). The data frame looks like this:

df <- data.frame(list(var1=c(-99,1,1,-99,1,1),
                      var2=c(-99,-99,1,1,-99,1),
                      var3=c(-99,-99,-99,-99,1,1),
                      vars_seen=c("var1", "var1", "var1,var2", "var1,var2", "var1,var2,var3", "var1,var2,var3")))
var1 var2 var3 vars_seen
-99 -99 -99 var1
1 -99 -99 var1
1 1 -99 var1,var2
-99 1 -99 var1,var2
1 -99 1 var1,var2,var3
1 1 1 var1,var2,var3

All respondents could have seen all 3 items. But some respondents didn't finish.

Now I want to change the values to -77 for each respondent that could have seen the items but didn't.

The data frame should look like this:

var1 var2 var3 vars_seen
-99 -77 -77 var1
1 -77 -77 var1
1 1 -77 var1,var2
-99 1 -77 var1,var2
1 -99 1 var1,var2,var3
1 1 1 var1,var2,var3

As you can see, there are some -99's left. So I cannot just replace -99 with -77.

What I tried so far:

# create a vector with all items that could have been seen
vars <- c("var1","var2","var3")

# split the comma-separated string to a vector
df$vars_seen.vec <- stringr::str_split(df$vars_seen,",")

# create a function that applies a setdiff on each row
notseen <- function(vars_seen,vars){
  setdiff(vars, vars_seen)
}  

# apply the function
df$vars_notseen.vec <- sapply(df$vars_seen.vec, notseen, vars)

Now we have a column that contains all items that weren't seen by the respondents. But I don't know how to set the variable based on that.

2
I don't understand fully, why not use indicator variables Alongside the variable scores. eg. df <- data.frame(list(var1=c(0,1,1,0,1,1), score1=c(0,2,3,0,4,5),...oaxacamatt

2 Answers

0
votes

We can split the 'vars_seen' column by , and do the replacement within Map

df[1:3] <- do.call(rbind, Map(function(x, y) 
     replace(x, !names(x) %in% y, -77), asplit(df[1:3], 1), 
         strsplit(df$vars_seen, ",")))

-output

df
#  var1 var2 var3      vars_seen
#1  -99  -77  -77           var1
#2    1  -77  -77           var1
#3    1    1  -77      var1,var2
#4  -99    1  -77      var1,var2
#5    1  -99    1 var1,var2,var3
#6    1    1    1 var1,var2,var3
0
votes

Struggling with this problem for 2 months and one day after I posted this I found a way to make it work:

df <- data.frame(list(var1=c(-99,1,1,-99,1,1),
                      var2=c(-99,-99,1,1,-99,1),
                      var3=c(-99,-99,-99,-99,1,1),
                      vars_seen=c("var1", "var1", "var1,var2", "var1,var2", "var1,var2,var3", "var1,var2,var3")))

vars <- c("var1","var2","var3")

for (i in vars){
  df[[i]][! df$vars_seen %like% i] <- -77
}