I have two data tables that I want to merge/join based on values in two columns that can appear in reverse order in the two data tables. Here are two example data tables:
library(data.table)
# df1
col1 <- c("aa", "bb", "cc", "dd")
col2 <- c("bb", "zz", "dd", "ff")
x <- c(130, 29, 122, 85)
dt1 <- data.table(col1, col2, x)
col1 col2 x
1: aa bb 130
2: bb zz 29
3: cc dd 122
4: dd ff 85
# df2
col1 <- c("zz", "bb", "cc", "ff")
col2 <- c("bb", "aa", "dd", "dd")
y <- c(34, 567, 56, 101)
dt2 <- data.table(col1, col2, y)
col1 col2 y
1: zz bb 34
2: bb aa 567
3: cc dd 56
4: ff dd 101
So the values in col1 and col2 taken together are the same for both data tables, but the distribution is different. E.g. aa is in col1 in dt1, but in col2 in dt2. I want to merge/join the data tables based on pairs of col1 and col2, but they might be in reverse order in the other data table. (Note that simply sorting them doesn't work.)
This means that the merge/join etc. must be able to 'see' that the pair aa+bb in dt1 occurs as bb+aa in dt2 and assign the correct value of dt2, i.e. the desired output is either:
col1 col2 x y
1: aa bb 130 567
2: bb zz 29 34
3: cc dd 122 56
4: dd ff 85 101
or this (i.e. whether the order of dt1 or dt2 is retained doesn't matter):
col1 col2 x y
1: zz bb 29 34
2: bb aa 130 567
3: cc dd 122 56
4: ff dd 85 101
My original data tables have approx. 3 million rows (yes, they're huge), so doing anything by hand is out of the question. I've looked around here, but I couldn't find any solution that is applicable to my case. Does anyone know how to do this?
Any hints are greatly appreciated!
dt2[col1 > col2, c("col1", "col2") := .(col2, col1)]; dt1[dt2, on=.(col1, col2)]
works. Or you could use:=
to add the column todt1
as sirallen suggested. – Frank