1
votes

I want to recode multiple variables (circa 90 of them) based on a value in another variable.

Here's a sample:

df <- data.frame(var1 = c(1,0,5,10,0), var2 = c(1:5), var3 = c(1:5), var4 = 
c(1:5), var5 = c(1:5), var6 = c(1:5), var90 = c(1:5))

df

  var1 var2 var3 var4 var5 var6 var90
1    1    1    1    1    1    1     1
2    0    2    2    2    2    2     2
3    5    3    3    3    3    3     3
4   10    4    4    4    4    4     4
5    0    5    5    5    5    5     5

When var1 = 0, I want to recode var2 to var90 as 0, otherwise leave them as they are.

I can recode a single variable easily enough using:

df[df$var1 == 0, c("var2")] <- 0

And I can apply this over multiple columns using numeric notation:

df[df$var1 == 0, c(2:7)] <- 0

Which does the job nicely on this little sample set, this is the expected result:

  var1 var2 var3 var4 var5 var6 var90
1    1    1    1    1    1    1     1
2    0    0    0    0    0    0     0
3    5    3    3    3    3    3     3
4   10    4    4    4    4    4     4
5    0    0    0    0    0    0     0

However, on my full dataset (~90 variables, 90,000 rows) I had to abort after about 20 minutes because it still hadn't run (hoping to get this to run in seconds rather than minutes).

Any ideas of how to go about this efficiently?

3

3 Answers

3
votes

We can use set from data.table to make this faster

library(data.table)
setDT(df)
for(j in 2:7) set(df, i = which(df$var1 == 0), j = j, value = 0)
df
#   var1 var2 var3 var4 var5 var6 var90
#1:    1    1    1    1    1    1     1
#2:    0    0    0    0    0    0     0
#3:    5    3    3    3    3    3     3
#4:   10    4    4    4    4    4     4
#5:    0    0    0    0    0    0     0

In addition, we can use lapply from base R to avoid creating a big logical matrix

df[2:7] <- lapply(df[2:7], function(x) replace(x, df$var1 == 0, 0))
3
votes

With dplyr:

library(dplyr)

df %>%
  mutate_at(vars(var2:var90), funs(ifelse(var1 == 0, 0, .)))

Output:

  var1 var2 var3 var4 var5 var6 var90
1    1    1    1    1    1    1     1
2    0    0    0    0    0    0     0
3    5    3    3    3    3    3     3
4   10    4    4    4    4    4     4
5    0    0    0    0    0    0     0
3
votes

We can use some maths as well

df[2:7] <- df[2:7] * +(df$var1 != 0)

#  var1 var2 var3 var4 var5 var6 var90
#1    1    1    1    1    1    1     1
#2    0    0    0    0    0    0     0
#3    5    3    3    3    3    3     3
#4   10    4    4    4    4    4     4
#5    0    0    0    0    0    0     0

We convert var1 into 0, 1 for values which are equal to 0 and not equal to 0 respectively and then multiply those values with rest of the columns so 0 values become 0 whereas non 0 values stay as it is.