Create new column using row-wise operations in Data.Table R

Question

I have two data.tables in R as follows:

DT_A with columns sid, date, value1, value2, where sid is a primary key and date is a secondary key.

DT_B has columns sid, date1, date2, date3, ...., date12. Here sid is the primary key and hence each row corresponds to the unique sid and the other column names correspond to secondary keys in DT_A.

I want to add another column to DT_A where the ith row contains the corresponding element from the table DT_B[sid_i, date_i]

Below is the example code and the desired output:

require(data.table)
DT_A <- data.table(sid = c(1,2,3,4,5,1,3), 
                   date = c("Jan 2012", "Feb 2012", "April 2012", "May 2012", 
                            "Dec 2012", "Feb   2012", "Oct 2012"),
                   value1 = rep("1", 7), 
                   value2 = rep("1", 7))

DT_B <- data.table(sid = as.character(c(1,2,3,4,5)), 
                   "Jan 2012" = rep("1", 5),
                   "Feb 2012" = rep("2", 5),
                   "March 2012" = rep("3", 5),
                   "April 2012" = rep("4", 5),
                   "May 2012" = rep("5", 5),
                   "June 2012" = rep("6", 5),
                   "July 2012" = rep("7", 5),
                   "Aug          2012" = rep("8", 5),
                   "Sept 2012" = rep("9", 5),
                   "Oct 2012" = rep("10", 5),
                   "Nov 2012" = rep("11", 5),
                   "Dec 2012" = rep("12", 5))

#Set Keys
setkey(DT_A, sid, date)
setkey(DT_B, sid)

#Define Function fun1
fun1 <- function(x){ 
    tryCatch(DT_B[x[1], x[2], with=FALSE], error = function(e) NULL)
}

#Desired Output
DT_A$newCol <- sapply(apply(DT_A, 1, fun1),"[[",1)
DT_A

Although my current method works on this small example, my actual DT_A has 20 million rows. This method completely hangs there. Can we do this operation in a more efficient way using data.table or any other method?

I think the question should be reworded. This is not about doing row wise operations, this is about reshaping and merging. — geneorama

eddi eddi · Accepted Answer · 2014-06-18T17:26:09

Melt your second data.table:

library(reshape2)

DT_B.melted = melt(DT_B, id.vars = 'sid', variable.name = 'date')

Once it's melted, you can set both to the same key, and do your join/assign/whatnot:

setkey(DT_B.melted, sid, date)
setkey(DT_A, sid, date)

DT_A[DT_B.melted, newCol := value]

Create new column using row-wise operations in Data.Table R

1 Answers