Combine two data tables in R by a condition referring to two columns

Question

I have two data tables that I want to merge/join based on values in two columns that can appear in reverse order in the two data tables. Here are two example data tables:

library(data.table)
# df1
col1 <- c("aa", "bb", "cc", "dd") 
col2 <- c("bb", "zz", "dd", "ff") 
x <- c(130, 29, 122, 85)
dt1 <- data.table(col1, col2, x)

   col1  col2  x
1:   aa   bb 130
2:   bb   zz  29
3:   cc   dd 122
4:   dd   ff  85

# df2
col1 <- c("zz", "bb", "cc", "ff") 
col2 <- c("bb", "aa", "dd", "dd") 
y <- c(34, 567, 56, 101)
dt2 <- data.table(col1, col2, y)

    col1 col2  y
1:   zz   bb  34
2:   bb   aa 567
3:   cc   dd  56
4:   ff   dd 101

So the values in col1 and col2 taken together are the same for both data tables, but the distribution is different. E.g. aa is in col1 in dt1, but in col2 in dt2. I want to merge/join the data tables based on pairs of col1 and col2, but they might be in reverse order in the other data table. (Note that simply sorting them doesn't work.)

This means that the merge/join etc. must be able to 'see' that the pair aa+bb in dt1 occurs as bb+aa in dt2 and assign the correct value of dt2, i.e. the desired output is either:

   col1 col2   x   y
1:   aa   bb 130 567
2:   bb   zz  29  34
3:   cc   dd 122  56
4:   dd   ff  85 101

or this (i.e. whether the order of dt1 or dt2 is retained doesn't matter):

   col1 col2   x   y
1:   zz   bb  29  34
2:   bb   aa 130 567
3:   cc   dd 122  56
4:   ff   dd  85 101

My original data tables have approx. 3 million rows (yes, they're huge), so doing anything by hand is out of the question. I've looked around here, but I couldn't find any solution that is applicable to my case. Does anyone know how to do this?

Any hints are greatly appreciated!

dt2[col1 > col2, c("col1", "col2") := .(col2, col1)]; dt1[dt2, on=.(col1, col2)] works. Or you could use := to add the column to dt1 as sirallen suggested. — Frank
@Frank, this works! Thanks so much! Do you maybe want to post that as an answer? — SandraA.
Np, glad it works :) Feel free to edit your answer with it or something similar. — Frank

sirallen sirallen · Accepted Answer · 2017-08-02T04:10:28

You can do the following:

dt1[dt2, on=.(col1, col2), y:= y]

dt1[dt2, on=.(col1==col2, col2==col1), y:= i.y]

> dt1
#    col1 col2   x   y
# 1:   aa   bb 130 567
# 2:   bb   zz  29  34
# 3:   cc   dd 122  56
# 4:   dd   ff  85 101

Combine two data tables in R by a condition referring to two columns

3 Answers