Replace multiple NA values with a correct values from a different set of columns

Question

I have a dataframe with missing data for some rows. The correct data can be found in another set of columns. I want to replace the NAs with the correct data.

My data looks like this:

df <- data.frame(M_1=c(1,NA,3,NA,6),
           M_2=c(5,NA,3,NA,1),
           M_3=c(6,NA,2,NA,4),
           M_C_1=c(NA,2,NA,6,NA),
           M_C_2=c(NA,1,NA,4,NA),
           M_C_3=c(NA,7,NA,3,NA))
df
#   M_1 M_2 M_3 M_C_1 M_C_2 M_C_3
#1   1   5   6    NA    NA    NA
#2  NA  NA  NA     2     1     7
#3   3   3   2    NA    NA    NA
#4  NA  NA  NA     6     4     3
#5   6   1   4    NA    NA    NA

For all records, I either have a complete set of records for variables M_1, M_2, and M_3 or I have a complete set for variables M_C_1, M_C_2, and M_C_3.

For each row that has NAs in the first set of variables (M_1:M_3), I would like to replace with the values from the second set of values (M_C_1:M_C_2). I dont need to retain the second set of values. So my desired data frame would look like:

df
#   M_1 M_2 M_3
#1   1   5   6    
#2   2   1   7
#3   3   3   2
#4   6   4   3     
#5   6   1   4

My real dataset contains many columns in this notation, so I need a general solutions (ie, I dont want to refer to each column individually).

I would like to do this with dplyr if possible.

Darren Tsai Darren Tsai · Accepted Answer · 2022-07-28T12:22:32

You could use map2 + coalesce:

library(dplyr)
library(purrr)

map2_dfc(select(df, 1:3), select(df, 4:6), coalesce)

# # A tibble: 5 × 3
#     M_1   M_2   M_3
#   <dbl> <dbl> <dbl>
# 1     1     5     6
# 2     2     1     7
# 3     3     3     2
# 4     6     4     3
# 5     6     1     4

Replace multiple NA values with a correct values from a different set of columns

3 Answers