The sample datasets provided with the question indicate that the names of the columns may differ between datasets, e.g., column b
of dt1
and column b2
of dt2
are supposed to be added.
Here are two approaches which should be working for an arbitrary number of arbitrarily named pairs of columns:
- Working in long format
- EDIT: Update joins using
get()
- EDIT 2: Computing on the language
1. Working in long format
The information on corresponding columns can be provided in a look-up table or translation table:
library(data.table)
lut <- data.table(vars1 = c("a", "b", "c"), vars2 = c("a2", "b2", "c2"))
lut
vars1 vars2
1: a a2
2: b b2
3: c c2
In cases where column names are treated as data and the column data are of the same data type my first approach is to reshape to long format.
# reshape to long format
mdt1 <- melt(dt1[, rn := .I], measure.vars = lut$vars1)
mdt2 <- melt(dt2[, groupVar := .I], measure.vars = lut$vars2)
# update join to translate variable names
mdt2[lut, on = .(variable = vars2), variable := vars1]
# update join to add corresponding values of both tables
mdt1[mdt2, on = .(groupVar, variable), value := x.value + i.value]
# reshape backe to wide format
dt3 <- dcast(mdt1, rn + groupVar ~ ...)[, rn := NULL][]
dt3
groupVar a b c
1: 1 11 22 33
2: 1 12 23 34
3: 1 13 24 35
4: 2 24 35 46
5: 2 25 36 47
6: 2 26 37 48
7: 3 37 48 59
8: 3 38 49 60
9: 3 39 50 61
10: 3 40 51 62
2. Update joins using get()
Giving a second thought, here is an approach which is similar to OP's proposed for
loop and requires much less coding:
vars1 <- c("a", "b", "c")
vars2 <- c("a2", "b2", "c2")
dt2[, groupVar := .I]
for (iv in seq_along(vars1)) {
dt1[dt2, on = .(groupVar),
(vars1[iv]) := get(paste0("x.", vars1[iv])) + get(paste0("i.", vars2[iv]))][]
}
dt1[]
a b c groupVar
1: 11 22 33 1
2: 12 23 34 1
3: 13 24 35 1
4: 24 35 46 2
5: 25 36 47 2
6: 26 37 48 2
7: 37 48 59 3
8: 38 49 60 3
9: 39 50 61 3
10: 40 51 62 3
Note that dt1
is updated by reference, i.e., without copying.
Prepending the variable names vars1[iv]
by "x."
and vars2[iv]
by "i."
on the right hand side of :=
is to ensure that the right columns from dt1
and dt2
, resp., are picked in case of duplicated column names. See the Advanced: section on the j
parameter in help("data.table")
.
3. Computing on the language
This follows Matt Dowle's advice to create one expression to be evaluated, "similar to constructing a dynamic SQL statement to send to a server". See here for another use case.
library(glue) # literal string interpolation
library(magrittr) # piping used to improve readability
EVAL <- function(...) eval(parse(text = paste0(...)), envir = parent.frame(2))
data.table(vars1 = c("a", "b", "c"), vars2 = c("a2", "b2", "c2")) %>%
glue_data("{vars1} = x.{vars1} + i.{vars2}") %>%
glue_collapse( sep = ", ") %>%
{glue("dt1[dt2[, groupVar := .I], on = .(groupVar), `:=`({.})][]")} %>%
EVAL()
a b c groupVar
1: 11 22 33 1
2: 12 23 34 1
3: 13 24 35 1
4: 24 35 46 2
5: 25 36 47 2
6: 26 37 48 2
7: 37 48 59 3
8: 38 49 60 3
9: 39 50 61 3
10: 40 51 62 3
It starts with a look-up table which is created on-the-fly and subsequently manipulated to form a complete data.table statement
dt1[dt2[, groupVar := .I], on = .(groupVar), `:=`(a = x.a + i.a2, b = x.b + i.b2, c = x.c + i.c2)][]
as a character string. This string is then evaluated and executed in one go; no for
loops required.
As the helper function EVAL()
already uses paste0()
the call to glue()
can be omitted:
data.table(vars1 = c("a", "b", "c"), vars2 = c("a2", "b2", "c2")) %>%
glue_data("{vars1} = x.{vars1} + i.{vars2}") %>%
glue_collapse( sep = ", ") %>%
{EVAL("dt1[dt2[, groupVar := .I], on = .(groupVar), `:=`(", ., ")][]")}
Note that dot .
and curly brackets {}
are used with different meaning in different contexts which may appear somewhat confusing.