Ok I have a complex function built using data.frames and in trying to speed it up I've turned to data.table. I'm totally new to this so I'm quite befuddled. Anyhow I've made a much much simpler toy example of what I want to do, but I cannot work out how to translate it into data.table format. Here is the example in data.frame form:
rows <- 10
data1 <- data.frame( id =1:rows,
a = seq(0.2, 0.55, length.out = rows),
b = seq(0.35, 0.7, length.out = rows),
c = seq(0.4, 0.83, length.out = rows),
d = seq(0.6, 0.87, length.out = rows),
e = seq(0.7, 0.99, length.out = rows),
f = seq(0.52, 0.90, length.out = rows)
)
DT1 <- data.table(data1) #for later
data2 <- data.frame( id =3:1,
a = rep(3, 3),
d = rep(2, 3),
f = rep(1, 3)
)
m.names <- c("a", "d", "f")
data1[match(data2$id, data1$id),m.names] <- data1[match(data2$id, data1$id),m.names] + data2[match(data2$id, data1$id),m.names]
So note in the last step that I want perform addition between the pre-existing figures and the new data and its vectorised across several columns.
In a data.table format I've only gotten this far:
DT1[id %in% data2$id, m.names, with=FALSE]
This selects the values I want to add to but after that I am lost. I would appreciate any help !
EDIT:
Ok I've figure out part of it - I can use the last line of code above to achieve the vectorised addition part using using data2 to store the added values as follows:
data2[,m.names] <- data2[,m.names] + data.frame(DT1[id %in% data2$id, m.names, with=FALSE])
Even with 2.5million rows (in DT1) and 10,000 rows in data2 and 6 matching columns this only takes 0.004sec, but I still need to assign the new data2 to the appropriate dynamically assigned columns in data 1
setkey(setDT(data1), id) ; data1[data2, ":="(a = a + i.a,d = d + i.d,f = f + i.f)]
– David Arenburg