2
votes

I am trying to write a specialized ifelse() function that I want to pass to dplyr::mutate(across()). The function should replace NA values in columns specified in across() with those in similarly-named columns.

For instance in the following made-up data, I want to replace missing x_var1 with y_var1 and missing x_var2 with y_var2:

x <- tribble(~x_var1, ~x_var2, ~y_var1, ~y_var2,
             5, 2, 0, 0,
             NA, 10, 8, 0,
             3, NA, 0, 5,
             NA, NA, 7, 9)   

I have tried constructing the following function:

ifelse_spec <- function(var) {
  new_var = paste("y_", str_remove(cur_column(), "x_"), sep = "")
 
  # print(new_var) # just to check new_var is correct 

  ifelse(is.na(var), !!sym(new_var) , var)  # how to call new_var?
}

x %>%
  mutate(across(c(x_var1, x_var2),
                ~ ifelse_spec(.)))

but it doesn't seem to work.

However, if I run this one-variable case using ifelse directly, I get the expected result.

x %>% 
  mutate(across(c(x_var1),
                ~ifelse(is.na(.), !!sym("y_var1"), .)))

How can I construct a custom ifelse statement that will allow me to call a data variable?

Edit: I got the following to work for the many-variable case, but still using ifelse and not a different function.

x %>% 
  mutate(across(c(x_var1, x_var2),
                ~ifelse(is.na(.), eval(sym(paste("y_", str_remove(cur_column(), "x_"), sep = ""))), . )))
1
I answered with what I think is a cleaner way to approach this problem, but as to why your function doesn't work, it's because ifelse() is evaluated within the function's environment and !!sym(new_var) (e.g. "y_var1") isn't defined there. You could try evaluating it in its calling frame, adding a .data argument to ifelse_spec, or rewriting it as a function factory that returns an expression to be evaluated within mutate(), but personally I wouldn't lean so heavily on non-standard evaluation if I didn't absolutely need to.Joe Roe

1 Answers

1
votes

coalesce() is designed for this problem (filling missing values from other columns). You can simplify your one-variable case by using it instead of ifelse:

library(dplyr, warn.conflicts = FALSE)
library(stringr)
library(purrr)

x <- tribble(~x_var1, ~x_var2, ~y_var1, ~y_var2,
             5, 2, 0, 0,
             NA, 10, 8, 0,
             3, NA, 0, 5,
             NA, NA, 7, 9)

x %>% 
  mutate(x_var1 = coalesce(x_var1, y_var1))
#> # A tibble: 4 x 4
#>   x_var1 x_var2 y_var1 y_var2
#>    <dbl>  <dbl>  <dbl>  <dbl>
#> 1      5      2      0      0
#> 2      8     10      8      0
#> 3      3     NA      0      5
#> 4      7     NA      7      9

You can then use select() to generalise this to coalesce across similarly-named columns:

x %>% 
  mutate(x_var1 = do.call(coalesce, select(., ends_with("var1"))))
#> # A tibble: 4 x 4
#>   x_var1 x_var2 y_var1 y_var2
#>    <dbl>  <dbl>  <dbl>  <dbl>
#> 1      5      2      0      0
#> 2      8     10      8      0
#> 3      3     NA      0      5
#> 4      7     NA      7      9

Finally, use map_dfc to apply this function to each column, using pattern matching to extract the "column group" it belongs to:

x %>% 
  colnames() %>% 
  str_extract("var[0-9]") %>% 
  set_names(colnames(x)) %>% 
  map_dfc(~do.call(coalesce, select(x, ends_with(.))))
#> # A tibble: 4 x 4
#>   x_var1 x_var2 y_var1 y_var2
#>    <dbl>  <dbl>  <dbl>  <dbl>
#> 1      5      2      5      2
#> 2      8     10      8     10
#> 3      3      5      3      5
#> 4      7      9      7      9

You will need to adapt str_extract() and ends_with() to fit the column names in your real data, but I think this should generalise to any reasonable naming scheme. If it's important to apply a custom function to your real data instead of coalesce(), it should also be possible to rewrite map_dfc() to use it.