merging tables with different column names

Question

If I understand correctly, by default data.table merges two tables by comparing columns which are set as keys and have same names. How should I write if I have tables with different column names? For example:

set.seed(123)
DT1<-data.table(col1=sample(letters,5,replace=TRUE),col2=sample(LETTERS[1:5],5,replace=TRUE),col3=sample(1:2,5,replace=TRUE))
DT2<-data.table(col4=sample(1:3,10,replace=TRUE),col5=sample(LETTERS[1:5],10,replace=TRUE),col6=sample(1:100,10,replace=TRUE))
(DT1)
(DT2)

> (DT1)
   col1 col2 col3
1:    h    A    2
2:    u    C    1
3:    k    E    2
4:    w    C    2
5:    y    C    1
> (DT2)
    col4 col5 col6
 1:    3    D   48
 2:    1    C   76
 3:    1    C   22
 4:    1    B   32
 5:    3    A   24
 6:    3    E   15
 7:    3    E   42
 8:    2    D   42
 9:    3    D   37
10:    2    A   16

What should I write in DT1[ to make a merge selecting only rows with col2==col5 & col3==col4? Here is an expected output:

   col1 col2 col3 col4 col5 col6
    h    A    2    2    A   16
    u    C    1    1    C   76
    u    C    1    1    C   22
    y    C    1    1    C   76
    y    C    1    1    C   22

Thanks in advance!

i'll add that this is a good question to have in mind when trying to figure out why data.table merged on the "wrong" columns when you haven't specified the keys prudently. — MichaelChirico

Arun Arun · Accepted Answer · 2013-04-16T21:31:16

Using data.table's subset based joins along with the recently implemented on= argument and nomatch=0L, this is simply:

DT2[DT1, on=c(col5="col2", col4="col3"), nomatch=0L]

See the secondary indices vignette for more.

Alternatively if you've the data.tables keyed, then you can skip the on= argument. But the solution above would be idiomatic as it retains the order of original data.tables, and it is clear to tell what columns are being looked up by looking at the code.

setkey(DT1, col2, col3)
setkey(DT2, col5, col4)
DT2[DT1, nomatch=0L]

See history for older versions.

merging tables with different column names

1 Answers