How to select rows in a data.table matching simultaneously two conditions in another data.table?

Question

I have two data.tables DT1 and DT2, with DT1 possibly large and more columns than DT2. I want to select rows in DT1 where two columns of DT1 have exact matches in the same row of two columns in DT2. For example

DT1 = data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), z=1:9)
DT2 = data.table(f=c("a","b"), g=c(1,3))

The output, DT1sub, I'm looking for is

   x y z
1: a 1 4
2: b 3 2

My problem is, when I try to subset DT1, I also get those rows for which only one column matches

> DT1[x%in%DT2$f & y%in%DT2$g]
#    x y z
# 1: b 1 1
# 2: b 3 2
# 3: a 1 4
# 4: a 3 5

I could get my desired output, DT1sub, with a clunky for loop like

DT1sub<-c()
for (i in 1:2)
  DT1sub<-rbind(DT1sub,DT1[x==DT2$f[i] & y==DT2$g[i]])
DT1sub

but I was wondering if there was a smarter data.table version of this. This is probably straightforward, but I couldn't piece it together from the example("data.table").

arg0naut91 arg0naut91 · Accepted Answer · 2019-02-23T15:36:31

Are you looking for:

library(data.table)

DT1sub <- DT1[DT2, on = .(x = f, y = g)]

Output:

   x y z
1: a 1 4
2: b 3 2

This is basically a filtering join - it only keeps those rows in x that match anything in f, and the same for y and g.

How to select rows in a data.table matching simultaneously two conditions in another data.table?

3 Answers