0
votes

I have 3 data frames. The first data frame (say df1) has multiple rows and columns. The second and third data frames (say df2 and df3) have only one row and a subset of columns from df1. The column names in df2 and df3 are same. So what I want to do is to compare each row in df1 with the single row in df2 and df3. If the value of a cell from df1 matches with the cell content of df2, replace the value of the cell in df1 with 1 and if the value of the cell from df1 matches with df3, replace the value of the cell in df1 with 2 and if the cell content of df2 doesn't match with either df2 or df3, replace the value of the cell in df1 with -. I wrote a loop to do this but it is slow. I would like to know if there is any optimized way to do this. Thank you.

Here are the example data frames and the expected output:

df1
c1  c2  c3  c4  c5  c6  c7  c8  c9  c10 c11 c12
 q  w   e   r   t   y   q   w   e   r   t   y
 q  e   r   t   y   q   e   r   e   r   t   y
 w  e   r   t   y   t   q   w   e   r   w   t

df2
                c5  c6  c7  c8  c9  c10 c11 c12
                t   y   q   w   e   t   w   t

df3             
                c5  c6  c7  c8  c9  c10 c11 c12
                y   q   q   t   e   r   t   t

Expected output:                
c1  c2  c3  c4  c5  c6  c7  c8  c9  c10 c11 c12
q   w   e   r   1   1   1   1   1   2   2   -
q   e   r   t   2   2   -   -   1   2   2   -
w   e   r   t   2   -   1   1   1   2   1   1
2

2 Answers

0
votes

We can find common columns using intersect. Repeat rows of df2 and df3 and compare them with df1 and replace the matching values in df1 by 1 and that of df2 by 2 and replace all other by "-".

cols <- intersect(names(df1), names(df2))
df1[cols][df1[cols] == df2[rep(seq_len(nrow(df2)), nrow(df1)), ]] <- 1
df1[cols][df1[cols] == df3[rep(seq_len(nrow(df3)), nrow(df1)), ]] <- 2
df1[cols][(df1[cols] != 1) & (df1[cols] != 2)] <- "-"


df1
#  c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12
#1  q  w  e  r  1  1  1  1  1   2   2   -
#2  q  e  r  t  2  2  -  -  1   2   2   -
#3  w  e  r  t  2  -  1  1  1   2   1   1

Based on comments if we want to fill the remaining values in df1 which is not present in df2 and df3, we can find out the mismatched indices and use paste0 to paste values together.

temp_df2 <- df2[rep(seq_len(nrow(df2)), nrow(df1)), ]
temp_df3 <- df3[rep(seq_len(nrow(df2)), nrow(df1)), ]
df1[cols][df1[cols] == temp_df2] <- 1
df1[cols][df1[cols] == temp_df3] <- 2
inds <- (df1[cols] != 1) & (df1[cols] != 2)
df1[cols][inds] <- paste0(df1[cols][inds], temp_df2[inds], temp_df3[inds])

df1
#  c1 c2 c3 c4 c5  c6  c7  c8 c9 c10 c11 c12
#1  q  w  e  r  1   1   1   1  1   2   2 ytt
#2  q  e  r  t  2   2 eqq rwt  1   2   2 ytt
#3  w  e  r  t  2 tyq   1   1  1   2   1   1
0
votes

We can create couple of logical matrix by replicating the rows of 'df2' and 'df3' while selecting only the columns of df1 that are the same from 'df2' or 'df3', then we assgn the values in a single step by logical comparison

i1 <- df1[names(df2)] == df2[rep(1, nrow(df1)),]
i2 <- df1[names(df3)] == df3[rep(1, nrow(df1)),]
df1[names(df3)] <- ((!i1 & i2) + 1) * NA^(!i1 & !i2)
df1
#   c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12
#1  q  w  e  r  1  1  1  1  1   2   2  NA
#2  q  e  r  t  2  2 NA NA  1   2   2  NA
#3  w  e  r  t  2 NA  1  1  1   2   1   1

It may be better to have NA instead of - as - can change the column type from numeric to character

data

df1 <- structure(list(c1 = c("q", "q", "w"), c2 = c("w", "e", "e"), 
    c3 = c("e", "r", "r"), c4 = c("r", "t", "t"), c5 = c("t", 
    "y", "y"), c6 = c("y", "q", "t"), c7 = c("q", "e", "q"), 
    c8 = c("w", "r", "w"), c9 = c("e", "e", "e"), c10 = c("r", 
    "r", "r"), c11 = c("t", "t", "w"), c12 = c("y", "y", "t")), class = "data.frame", row.names = c(NA, 
-3L))

df2 <- structure(list(c5 = "t", c6 = "y", c7 = "q", c8 = "w", c9 = "e", 
    c10 = "t", c11 = "w", c12 = "t"), class = "data.frame", row.names = c(NA, 
-1L))

df3 <- structure(list(c5 = "y", c6 = "q", c7 = "q", c8 = "t", c9 = "e", 
    c10 = "r", c11 = "t", c12 = "t"), class = "data.frame", row.names = c(NA, 
-1L))