0
votes

Long time lurker, first time poster. I am relatively new to R and I am cleaning a 200 row survey dataset. The survey used quite a bit of branching and is returning NAs for cases where the participant did not receive the question so it is throwing off my missing data analysis. I have spent a day and a half trying to figure out an eloquent solution (for loops, if conditions, filter(dplyr), mutate(dplyr), case_when). I settled on a solution using mutate() but I know there's a better way to do this.

Here's a reproducible data frame to act as the survey data

df <- data.frame(attsess1 = c(1, 1, 1, -99, -99), attsess2 = c(-99, -99, 1, 1, 1), s1satis = c(1, 1, 1, NA, NA), s1time = c(1, 1, 1, NA, NA), s1qual = c(1, 1, NA, NA, NA), s2satis = c(NA, NA, NA, 1, 1), s2time = c(NA, NA, 1, 1, 1))

Essentially, if attsess1 equals -99 then I want the following corresponding rows in s1satis, s1time, s1qual to change their NAs to -99. Same logic applies to attsess2 and s2satis and s2time.

The code below is what I used. It works but it takes too many lines and this could be problematic for large datasets with a lot of variables

library(dplyr)
df1 <- df %>% mutate(s1satis = case_when(attsess1 == -99 ~ -99)) %>% mutate(s1time  = case_when(attsess1 == -99 ~ -99)) %>% mutate(s1qual = case_when(attsess1 == -99 ~ -99)) %>% mutate(s2satis = case_when(attsess2 == -99 ~ -99)) %>% mutate(s2time = case_when(attsess2 == -99 ~ -99))

I tried using mutate_at and case_when, but received this error message: must be a double vector, not an integer vector. I also tried for loops nested with if conditions, but I can't remember the error message I received. I also came across several forums where the authors were advocating to replace for loops for dplyr functions-- can anyone provide more insight to this?

Thanks for your help!

2
Is it expected to change 1s for NAs? - markus
Hi Markus, it's expected to change NA for -99. Does this answer your question - Sarah Narvaiz
In your example attsess1 is 1, 1, 1, -99, -99. s1satis starts as 1, 1, 1, NA, NA. Based on your description, I would expect the desired result for s1satis to be 1, 1, 1, -99, -99, replacing the NA with -99 where attsess1 is -99 and leaving the 1 values untouched. But when I run your "working but slow" code, I get NA, NA, NA, -99, -99. Which is the right answer? - Gregor Thomas
Gregor, you're right-- the 1s should be untouched. Your expected result is correct and not the code I have (which is annoying). - Sarah Narvaiz

2 Answers

0
votes

Make sure your dplyr version is updated and this idea should work:

df %>%
  mutate(
    across(starts_with("s1"), ~ case_when(attsess1 == -99 ~ -99, TRUE ~ .)),
    across(starts_with("s2"), ~ case_when(attsess2 == -99 ~ -99, TRUE ~ .))
  )

#   attsess1 attsess2 s1satis s1time s1qual s2satis s2time
# 1        1      -99       1      1      1     -99    -99
# 2        1      -99       1      1      1     -99    -99
# 3        1        1       1      1     NA      NA      1
# 4      -99        1     -99    -99    -99       1      1
# 5      -99        1     -99    -99    -99       1      1

Though I'm not really sure about your desired result, see my comment on your question.

0
votes

If you want to take an entire dataset and replace all NAs with -99, you can use this:

df %>%
  mutate_all(~replace(., is.na(.), -99))