Selecting columns based on missing values in each row

Question

I would like to know (for each row) which columns any NA in my data falls under. The goal is to create a new column/variable that lists the names of the columns the data shows NA for, for that particular row, preferably using dplyr.

Using this mock data,

data = tibble(var_1 = c(NA, 4, 5, 6, 7), var_2 = c(4, 5, 6, 7, 8), var_3 = c(NA, NA, NA, 3, 5))

I'd like to create the missing_col column:

  var_1 var_2 var_3       missing_col
1    NA     4    NA  "var_1", "var_3"             
2     4     5    NA           "var_3"
3     5     6    NA           "var_3"
4     6     7     3                NA
5     7     8     5                NA

My approach thus far has been to use the rowwise() function in conjunction with mutate and a nested select_if() and a function. However, none of the functions that I have tried so far have allowed me to only consider each row individually (as opposed to the entire column). Below I have included the general structure of my approach.

data %>% 
  rowwise() %>%
  mutate(missing_col = select_if(function(x) ... )) %>%
  names()

Any guidance toward the appropriate function would be appreciated.

Jilber Urbina Jilber Urbina · Accepted Answer · 2019-03-27T20:40:16

> data %>% 
+   mutate(missing_col = apply(., 1, function(x) which(is.na(x)))  %>% 
+            map_chr(., function(x) if_else(length(x)==0, 
+                                           "NA", 
+                                           paste(names(x), collapse=", "))))
# A tibble: 5 x 4
  var_1 var_2 var_3 missing_col 
  <dbl> <dbl> <dbl> <chr>       
1    NA     4    NA var_1, var_3
2     4     5    NA var_3       
3     5     6    NA var_3       
4     6     7     3 NA          
5     7     8     5 NA

Selecting columns based on missing values in each row

4 Answers