Merge dataframes prioritising one over another

Question

I have two dataframes like below that shares the column names but information can be conflicting. (First dataframe has more information (ranging 0-5), second has less (range 0 and 4). I merge this two dataframes using dplyr or other R package but what I want to do is to use the information from the first dataframe when available, and when it isn't I use the information from the second dataframe.

For example, two dataframes to be merged are

ID    col1   col2  
A      0      0
B      3      0
C      5      3

ID   col1    col2 
A     4       4
B     0       0
C     4       4
D     0       4
E     4       4

What I would like after merging is:

ID    col1   col2
A      4      4
B      3      0
C      5      3
D      0      4
E      4      4

Can you expand on "more useful". It looks like you mean "bigger". Row C demonstrates you wanting the data from df1 because it contains the bigger range (5). Is that correct? — Amar
@Amar Yes. I meant by more info that df1 has more detailed information bc the range is 0-5, while df2 has two level (0 and 4). On the other hand, df1 has three rows and df2 has five rows. In that sense, df2 indeed has more info. Sorry for the confusing expression — Yoshiki

MKR MKR · Accepted Answer · 2018-02-20T07:03:39

One option can be using dplyr and tidyr. First replace 0 in dataframes with NA which helps in deciding if that value is valid or not. Merge both dataframe using bind_rows. Then group on ID so that rows are arrange on that column. Use fill to replace NA from other row of same ID.

# Data
df1 <- read.table(text="ID    col1   col2  
A      0      0
B      3      0
C      5      3", header = T, stringsAsFactors = F)

df2 <- read.table(text = "ID   col1    col2 
A     4       4
B     0       0
C     4       4
D     0       4
E     4       4", header = T, stringsAsFactors = F)

df1[df1==0] <- NA
df2[df2==0] <- NA

library(dplyr)
library(tidyr)

df1 %>% bind_rows(df2) %>%
  group_by(ID) %>%
  fill(col1, col2, .direction = "up")  %>%
  fill(col1, col2, .direction = "down") %>%
  unique() %>%
  filter((row_number() == 1)) 

#Result
#  ID     col1  col2
#  <chr> <int> <int>
#1 A         4     4
#2 B         3    NA
#3 C         5     3
#4 D        NA     4
#5 E         4     4

Note: The NA can be replaced with 0

Merge dataframes prioritising one over another

1 Answers