I have two data.tables in R as follows:
DT_A
with columns sid
, date
, value1
, value2
, where sid
is a primary key and date
is a secondary key.
DT_B
has columns sid
, date1
, date2
, date3
, ...., date12
. Here sid
is the primary key and hence each row corresponds to the unique sid
and the other column names correspond to secondary keys in DT_A
.
I want to add another column to DT_A
where the ith row contains the corresponding element from the table DT_B[sid_i, date_i]
Below is the example code and the desired output:
require(data.table)
DT_A <- data.table(sid = c(1,2,3,4,5,1,3),
date = c("Jan 2012", "Feb 2012", "April 2012", "May 2012",
"Dec 2012", "Feb 2012", "Oct 2012"),
value1 = rep("1", 7),
value2 = rep("1", 7))
DT_B <- data.table(sid = as.character(c(1,2,3,4,5)),
"Jan 2012" = rep("1", 5),
"Feb 2012" = rep("2", 5),
"March 2012" = rep("3", 5),
"April 2012" = rep("4", 5),
"May 2012" = rep("5", 5),
"June 2012" = rep("6", 5),
"July 2012" = rep("7", 5),
"Aug 2012" = rep("8", 5),
"Sept 2012" = rep("9", 5),
"Oct 2012" = rep("10", 5),
"Nov 2012" = rep("11", 5),
"Dec 2012" = rep("12", 5))
#Set Keys
setkey(DT_A, sid, date)
setkey(DT_B, sid)
#Define Function fun1
fun1 <- function(x){
tryCatch(DT_B[x[1], x[2], with=FALSE], error = function(e) NULL)
}
#Desired Output
DT_A$newCol <- sapply(apply(DT_A, 1, fun1),"[[",1)
DT_A
Although my current method works on this small example, my actual DT_A has 20 million rows. This method completely hangs there. Can we do this operation in a more efficient way using data.table
or any other method?
DT_A
– David Arenburg