Replace values in dataframe based on multiple rows from other dataframes

Question

I have 3 data frames. The first data frame (say df1) has multiple rows and columns. The second and third data frames (say df2 and df3) have only one row and a subset of columns from df1. The column names in df2 and df3 are same. So what I want to do is to compare each row in df1 with the single row in df2 and df3. If the value of a cell from df1 matches with the cell content of df2, replace the value of the cell in df1 with 1 and if the value of the cell from df1 matches with df3, replace the value of the cell in df1 with 2 and if the cell content of df2 doesn't match with either df2 or df3, replace the value of the cell in df1 with -. I wrote a loop to do this but it is slow. I would like to know if there is any optimized way to do this. Thank you.

Here are the example data frames and the expected output:

df1
c1  c2  c3  c4  c5  c6  c7  c8  c9  c10 c11 c12
 q  w   e   r   t   y   q   w   e   r   t   y
 q  e   r   t   y   q   e   r   e   r   t   y
 w  e   r   t   y   t   q   w   e   r   w   t

df2
                c5  c6  c7  c8  c9  c10 c11 c12
                t   y   q   w   e   t   w   t

df3             
                c5  c6  c7  c8  c9  c10 c11 c12
                y   q   q   t   e   r   t   t

Expected output:                
c1  c2  c3  c4  c5  c6  c7  c8  c9  c10 c11 c12
q   w   e   r   1   1   1   1   1   2   2   -
q   e   r   t   2   2   -   -   1   2   2   -
w   e   r   t   2   -   1   1   1   2   1   1

Ronak Shah Ronak Shah · Accepted Answer · 2019-09-24T05:28:16

We can find common columns using intersect. Repeat rows of df2 and df3 and compare them with df1 and replace the matching values in df1 by 1 and that of df2 by 2 and replace all other by "-".

cols <- intersect(names(df1), names(df2))
df1[cols][df1[cols] == df2[rep(seq_len(nrow(df2)), nrow(df1)), ]] <- 1
df1[cols][df1[cols] == df3[rep(seq_len(nrow(df3)), nrow(df1)), ]] <- 2
df1[cols][(df1[cols] != 1) & (df1[cols] != 2)] <- "-"


df1
#  c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12
#1  q  w  e  r  1  1  1  1  1   2   2   -
#2  q  e  r  t  2  2  -  -  1   2   2   -
#3  w  e  r  t  2  -  1  1  1   2   1   1

Based on comments if we want to fill the remaining values in df1 which is not present in df2 and df3, we can find out the mismatched indices and use paste0 to paste values together.

temp_df2 <- df2[rep(seq_len(nrow(df2)), nrow(df1)), ]
temp_df3 <- df3[rep(seq_len(nrow(df2)), nrow(df1)), ]
df1[cols][df1[cols] == temp_df2] <- 1
df1[cols][df1[cols] == temp_df3] <- 2
inds <- (df1[cols] != 1) & (df1[cols] != 2)
df1[cols][inds] <- paste0(df1[cols][inds], temp_df2[inds], temp_df3[inds])

df1
#  c1 c2 c3 c4 c5  c6  c7  c8 c9 c10 c11 c12
#1  q  w  e  r  1   1   1   1  1   2   2 ytt
#2  q  e  r  t  2   2 eqq rwt  1   2   2 ytt
#3  w  e  r  t  2 tyq   1   1  1   2   1   1

Replace values in dataframe based on multiple rows from other dataframes

2 Answers

data