Long time lurker, first time poster. I am relatively new to R and I am cleaning a 200 row survey dataset. The survey used quite a bit of branching and is returning NAs for cases where the participant did not receive the question so it is throwing off my missing data analysis. I have spent a day and a half trying to figure out an eloquent solution (for loops, if conditions, filter(dplyr), mutate(dplyr), case_when). I settled on a solution using mutate() but I know there's a better way to do this.
Here's a reproducible data frame to act as the survey data
df <- data.frame(attsess1 = c(1, 1, 1, -99, -99), attsess2 = c(-99, -99, 1, 1, 1), s1satis = c(1, 1, 1, NA, NA), s1time = c(1, 1, 1, NA, NA), s1qual = c(1, 1, NA, NA, NA), s2satis = c(NA, NA, NA, 1, 1), s2time = c(NA, NA, 1, 1, 1))
Essentially, if attsess1 equals -99 then I want the following corresponding rows in s1satis, s1time, s1qual to change their NAs to -99. Same logic applies to attsess2 and s2satis and s2time.
The code below is what I used. It works but it takes too many lines and this could be problematic for large datasets with a lot of variables
library(dplyr)
df1 <- df %>% mutate(s1satis = case_when(attsess1 == -99 ~ -99)) %>% mutate(s1time = case_when(attsess1 == -99 ~ -99)) %>% mutate(s1qual = case_when(attsess1 == -99 ~ -99)) %>% mutate(s2satis = case_when(attsess2 == -99 ~ -99)) %>% mutate(s2time = case_when(attsess2 == -99 ~ -99))
I tried using mutate_at and case_when, but received this error message: must be a double vector, not an integer vector. I also tried for loops nested with if conditions, but I can't remember the error message I received. I also came across several forums where the authors were advocating to replace for loops for dplyr functions-- can anyone provide more insight to this?
Thanks for your help!
1
s forNA
s? - markusattsess1
is1, 1, 1, -99, -99
.s1satis
starts as1, 1, 1, NA, NA
. Based on your description, I would expect the desired result fors1satis
to be1, 1, 1, -99, -99
, replacing the NA with -99 where attsess1 is -99 and leaving the1
values untouched. But when I run your "working but slow" code, I getNA, NA, NA, -99, -99
. Which is the right answer? - Gregor Thomas