data.table default outer join with setkey, counting rows (tabulate frequency of keys)

Question

I have two data.tables. I want to use the keys in one to count the occurrences in another.

DT1 <- data.table(A = c("v1","v2","v1","v2","v3","v4","v5","v6","v1","v2"),
                  B = c(1,11,111,2,22,222,3,33,333,4))
DT2 <- data.table(C = c("v1","v3","v99"))

setkey(DT1,A)
setkey(DT2,C)

Use DT2 to find rows in DT1 returns an outer join.

nrow(DT1[DT2,]) #returns 5 (v1*3, v3*1, v99) "v99" not in DT2
DT1[DT2,]

    A   B
1:  v1   1
2:  v1 111
3:  v1 333
4:  v3  22
5: v99  NA

Is it possible to force to left join with the same syntax?

Actually this is a left join... an outer join would be merge(DT1, DT2, by.x = "A", by.y = "C", all = TRUE). And v99 is present in DT2 — David Arenburg
You want to tabulate the (frequency of) keys in DT1[,A] according to the (partial) list of keys in DT2[,C]. Do you care about tabulating the other keys not included in DT2[,C] ? They get silently dropped, whereas R's base::tabulate allows the user to include them (by tweaking nbins). This is one case where dt's functionality is less than base. — smci

Frank Frank · Accepted Answer · 2016-04-07T14:14:24

I want to use the keys in one to count the occurrences in another.

DT1[DT2, .N, on=c(A="C"), by=.EACHI]

     A N
1:  v1 3
2:  v3 1
3: v99 0

data.table default outer join with setkey, counting rows (tabulate frequency of keys)

1 Answers