1
votes

I am switching a lot of my data manipulation pipelines from dplyr to data.table because of improved performance. I like the conciseness of the a[b] syntax for joins. dplyr::left_join(x, y) corresponds to y[x] in data.table. However the column order is different in both cases. Is there a way to replicate the column order you get from a dplyr left_join where new columns from y are added to the right-hand side of x, using the y[x] syntax of data.table? I know you can use merge(x, y, all.x = TRUE) but I was curious if the [] way can achieve the same result.

example

library(dplyr)
library(data.table)

x <- iris
y <- data.frame(Species = c('setosa', 'virginica'), foo = c(100, 200))

j1 <- left_join(x, y)

setDT(x)
setDT(y)

j2 <- y[x, on = 'Species']

j3 <- merge(x, y, all.x = TRUE)

How do I make j2 have the same column order as j1 and j3?

> head(j1, 1)
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species foo
1:          5.1         3.5          1.4         0.2  setosa 100
> head(j2, 1)
   Species foo Sepal.Length Sepal.Width Petal.Length Petal.Width
1:  setosa 100          5.1         3.5          1.4         0.2
> head(j3, 1)
   Species Sepal.Length Sepal.Width Petal.Length Petal.Width foo
1:  setosa          5.1         3.5          1.4         0.2 100
1

1 Answers

1
votes

We can do an assignment (:=), which would be more efficient as this does by reference

j2 <- copy(x)
j2[y, foo := foo, on = .(Species)]

-testing

all.equal(j1, type.convert(as.data.frame(j2), as.is = TRUE))
#[1] TRUE