2
votes

I'm trying to build a column in a data.table by interacting with another data.table and having trouble referring to variables correctly to do this without a for-loop. Once I enter the second data.table, I can no longer seem to refer to the column in the first data.table correctly.

This is kind of similar to Subsetting a data.table using another data.table but I believe the merge-style solutions aren't appropriate.

Consider something like

#used numbers instead of dates to not have to deal with formatting, but idea is the same.

dt1 <- data.table(id = c('a', 'b', 'c'), date1 = c(1.1, 5.4, 9.1), amt= '100')
dt2 <- data.table(date2 = c(1.3, 3, 6.4, 10.5),
              dt2col = c(1.5, 1.02, 1.005, .99)
              )

dt1[result := prod(dt2[date2-(date1)>0,
                              dt2col
                       ]
     )
     ]

I want the result to be a new column in dt1 which is the product of dt2col when date2 (in dt2) is later than date1 (in dt1) for each specific row in dt1. I think the (date1) part is the problem.

I expect result[1] to be the product of dt2col for all of them, but result[2] to be the product of dt2col for only the dates after '5/4/2018', etc.

2
You need to first convert the to Date class before extracting the year as year('1/10/2018')# [1] 1akrun
In my actual data the dates are in the correct format, my bad that I didn't actually get that in the sample code. I was trying to illustrate it more than anything, and I didn't want to bog down my example with the strptime functions etc.Sam Asin
We could instead make date1= c(1, 5, 9) and date2 = c(1.2, 3.1, 6.1, 10.1) and achieve the same effect. Maybe I should change it?Sam Asin
@akrun I changed it to make it work automatically!Sam Asin

2 Answers

2
votes

Here are some data.table options:

1) Using non-equi joins:

dt1[, result := dt2[dt1, on=.(date2 > date1), prod(dt2col), by=.EACHI]$V1]
dt1

2) Using rolling joins after calculating the cumulative product:

setorder(dt2, -date2)
dt2[, cprod := cumprod(dt2col)]
dt1[dt2, result := cprod, on=.(date1=date2), roll=Inf]

output:

   id date1 amt   result
1:  a   1.1 100 1.522273
2:  b   5.4 100 0.994950
3:  c   9.1 100 0.990000
1
votes
Try this:

dt1[,`:=`(date1 = as.Date.character(date1,format = "%d/%m/%Y"))]
dt2[,`:=`(date2 = as.Date.character(date2,format = "%d/%m/%Y"))]

dt1[,`:=`(inds = lapply(X = date1,function(t){
intersect(x = which(year(t)==year(dt2$date2)),
          y = which(as.integer(dt2$date2-t)>0))}))][,result:=
          lapply(X = inds,function(t){prod(dt2$dt2col[t])})]



#   id      date1 amt    inds   result
#1:  a 2018-01-01 100 1,2,3,4 1.522273
#2:  b 2018-04-05 100     1,4    1.485
#3:  c 2018-01-09 100     1,4    1.485