Using dplyr mutate_at when a function takes multiple arguments which are different columns

Question

I have a data.frame with a large number of columns whose names follow a pattern. Such as:

df <- data.frame(
  x_1 = c(1, NA, 3), 
  x_2 = c(1, 2, 4), 
  y_1 = c(NA, 2, 1), 
  y_2 = c(5, 6, 7)
)

I would like to apply mutate_at to perform the same operation on each pair of columns. As in:

df %>%
  mutate(
    x = ifelse(is.na(x_1), x_2, x_1), 
    y = ifelse(is.na(y_1), y_2, y_1)
  )

Is there a way I can do that with mutate_at/mutate_each?

This:

df %>%
  mutate_each(vars(x_1, y_1), funs(ifelse(is.na(.), vars(x_2, y_2), .)))

and various variations I've tried all fail.

The question is similar to Using functions of multiple columns in a dplyr mutate_at call, but different in that the second argument to the function call is not a single column, but a different column for each column in vars.

Thanks in advance.

I'm working with a similar thing right now. It's the same issue as in my previous question: stackoverflow.com/questions/47005763/…, but in this case the dataset is so big that RStudio crashes. — LightonGlass
a data.table set loop is probably one of the faster ways to do this. dplyr::coalesce might be a bit better for readability as well — zacdav

thothal thothal · Accepted Answer · 2018-07-11T12:35:56

Old question, but I agree with Jesse that you need to tidy your data a bit. gather would be the way to go, but it lacks somehow the possibility of stats::reshape where you can specify groups of columns to gather. So here's a solution with reshape:

df %>% 
   reshape(varying   = list(c("x_1", "y_1"), c("x_2", "y_2")), 
           times     = c("x", "y"),
           direction = "long") %>% 
   mutate(x = ifelse(is.na(x_1), x_2, x_1)) %>% 
   reshape(idvar     = "id", 
           timevar   = "time",
           direction = "wide") %>% 
   rename_all(funs(gsub("[a-zA-Z]+(_*)([0-9]*)\\.([a-zA-Z]+)", "\\3\\1\\2", .)))
#   id x_1 x_2 x y_1 y_2 y
# 1  1   1   1 1  NA   5 5
# 2  2  NA   2 2   2   6 2
# 3  3   3   4 3   1   7 1

In order to do that with any number of column pairs, you could do something like:

df2 <- setNames(cbind(df, df), c(t(outer(letters[23:26], 1:2, paste, sep = "_"))))
v <- split(names(df2), purrr::map_chr(names(df2), ~ gsub(".*_(.*)", "\\1", .)))
n <- unique(purrr::map_chr(names(df2), ~ gsub("_[0-9]+", "", .) ))
df2 %>% 
    reshape(varying   = v, 
            times     = n,
            direction = "long") %>% 
     mutate(x = ifelse(is.na(!!sym(v[[1]][1])), !!sym(v[[2]][1]), !!sym(v[[1]][1]))) %>% 
     reshape(idvar     = "id", 
             timevar   = "time",
             direction = "wide") %>% 
     rename_all(funs(gsub("[a-zA-Z]+(_*)([0-9]*)\\.([a-zA-Z]+)", "\\3\\1\\2", .)))
#   id w_1 w_2 w x_1 x_2 x y_1 y_2 y z_1 z_2 z
# 1  1   1   1 1  NA   5 5   1   1 1  NA   5 5
# 2  2  NA   2 2   2   6 2  NA   2 2   2   6 2
# 3  3   3   4 3   1   7 1   3   4 3   1   7 1

This assumes that columns which should be compared are next to each other and that all columns for with possible NA values are in columns suffixed by _1 and the replacement value columns are sufficed by _2.

Using dplyr mutate_at when a function takes multiple arguments which are different columns

3 Answers