2
votes

I have two data.tables. I want to use the keys in one to count the occurrences in another.

DT1 <- data.table(A = c("v1","v2","v1","v2","v3","v4","v5","v6","v1","v2"),
                  B = c(1,11,111,2,22,222,3,33,333,4))
DT2 <- data.table(C = c("v1","v3","v99"))

setkey(DT1,A)
setkey(DT2,C)

Use DT2 to find rows in DT1 returns an outer join.

nrow(DT1[DT2,]) #returns 5 (v1*3, v3*1, v99) "v99" not in DT2
DT1[DT2,]

    A   B
1:  v1   1
2:  v1 111
3:  v1 333
4:  v3  22
5: v99  NA

Is it possible to force to left join with the same syntax?

1
DT1[DT2, nomatch = 0L] ?mtoto
Actually this is a left join... an outer join would be merge(DT1, DT2, by.x = "A", by.y = "C", all = TRUE). And v99 is present in DT2David Arenburg
thanks mtoto exactly what I was search for.iboboboru
You want to tabulate the (frequency of) keys in DT1[,A] according to the (partial) list of keys in DT2[,C]. Do you care about tabulating the other keys not included in DT2[,C] ? They get silently dropped, whereas R's base::tabulate allows the user to include them (by tweaking nbins). This is one case where dt's functionality is less than base.smci

1 Answers

6
votes

I want to use the keys in one to count the occurrences in another.

DT1[DT2, .N, on=c(A="C"), by=.EACHI]

     A N
1:  v1 3
2:  v3 1
3: v99 0