2
votes

I am currently accustoming myself with data.table (for the a m a z i n g speed, as well as non-equi-joins).

I find the join-syntax a little counterintuitive, could someone help me out, how to look at left and right joins the "data.table"-way?

Examples from r-datatable.com

require(data.table)
example(data.table)
# joins as subsets
X = data.table(x=c("c","b"), v=8:7, foo=c(4,2))
X

DT[X, on="x"]                               # right join
X[DT, on="x"]                               # left join

Right Join is the default and the new object (X) is right joined?

1
Same goes for me, I prefer to do joins using merge, which in my opinion in most cases is just more intuitive. See also rstudio-pubs-static.s3.amazonaws.com/… - hannes101
For the left-join part of your question, this is a really good post that you could go through: stackoverflow.com/a/54313203/8583393 - markus
use "merge" on data.table objects. Method dispatching will make sure that you get data.table's speed gain. - abhiieor
When you have a X[Y] join it means: "For every value in Y try to join a value from X", hence, basically this is a left join to Y and the result will be the length of Y (I agree it's kind of counter-intuitive). - David Arenburg
I think this post, including the 'summary' in the actual question, is useful: Why does XY join of data.tables not allow a full outer join, or a left join?. jangorecki's data.table answer in the canonical join Q&A of course: How to join (merge) data frames (inner, outer, left, right)?. And, not the least @Frank's excellent tutorial - Henrik

1 Answers

0
votes

Right Join is the default and the new object (X) is right joined?

The reason for that is consistency to base R way of subset of vectors/matrices. I think there is an entry in FAQ for that. Notice when you use := during join you get left join. There is an issue which discuss consistency of merges with [ to base R, afair #1615.