2
votes

I have two data tables like below

DT1

library(data.table) library(lubridate)

    DT1<-data.frame(
                    id=c(7,7,7,3,3,3),
                    start_time=c("2017-11-01 08:37:35","2017-11-01 09:07:44","2017-11-01 09:46:16","2017-11-01 10:32:29","2017-11-01 10:59:25","2017-11-01 13:24:12"),
                    cube=c(628,625,469,711,376,628)
                    )
DT1=data.table(DT1)


             id        start_time           cube
1:           7 2017-11-01 08:37:35          628
2:           7 2017-11-01 09:07:44          625
3:           7 2017-11-01 09:46:16          469
4:           3 2017-11-01 10:32:29          711
5:           3 2017-11-01 10:59:25          376
6:           3 2017-11-01 13:24:12          628

DT2

DT2<-data.frame(
  id=c(7,7,7,3,3,3),
  res_time=c("2017-11-01 08:35:30","2017-11-01 09:07:48","2017-11-01 09:46:32","2017-11-01 10:31:29","2017-11-01 10:57:25","2017-11-01 13:22:10"),
  res_cube=c(309,625,469,712,375,630)
)
DT2=data.table(DT2)

             id        res_time           res_cube
1:           7 2017-11-01 08:35:30          309
2:           7 2017-11-01 09:07:48          625
3:           7 2017-11-01 09:46:32          469
4:           3 2017-11-01 10:31:29          712
5:           3 2017-11-01 10:57:25          375
6:           3 2017-11-01 13:22:10          630

From these two I need to join the nearest res_time for DT1 row and associate all the DT1 rows with appending its most closest res_timeand res_cube and res_time as well for each id group. So I tried

DT1 = DT1[,start_time := as.character(start_time)]
DT2 = DT2[,res_time := as.character(res_time)]


DT1 = DT1[,start_time := parse_date_time2(start_time,orders="YmdHMS",tz="NA")]
DT2 = DT2[,res_time := parse_date_time2(res_time,orders="YmdHMS",tz="NA")]


setkeyv(DT1, c("id","start_time"))    
setkeyv(DT2, c("id","res_time"))    
ans = DT1[DT2, roll=Inf]

But this gave something like this

              id    start_time              cube         res_cube
1:           7 2017-11-01 08:35:30           NA             309
2:           7 2017-11-01 09:07:48           NA             625
3:           7 2017-11-01 09:46:32           NA             469
4:           3 2017-11-01 10:31:29           NA             712
5:           3 2017-11-01 10:57:25           NA             375
6:           3 2017-11-01 13:22:10           NA             630

I did not get the res_time and I think I am doing something wrong in the rolling join.

I also noticed that I am getting res_time in the column of start_time of the joined result.

Any help is appreciated.

1
Can you provide reproducible versions of DT1 and DT2? It'll save answerers having to independently do a bunch of formatting and conversions to get something to test.thelatemail
@thelatemailsure I will update asap.Ricky
@thelatemail apologies forgot to change the column name.Updated itRicky

1 Answers

1
votes

I managed to get the results by a Rolling Forward instead of a Rolling Backward

setDT(DT1)            
setDT(DT2)            
DT1 = DT1[,start_time := parse_date_time2(start_time,orders="YmdHMS",tz="NA")]
DT2 = DT2[,res_time := parse_date_time2(res_time,orders="YmdHMS",tz="NA")]

DT1 [,time:=start_time]
DT2[,time:=res_time]

setkey(DT1 ,id,time)    
setkey(DT2,id,time)    

ans = DT2[DT1, roll=T]# perform a Rolling Forward where Each start_time in DT1 is matched to the closest res_time in DT2.

ans

   id            res_time cube                time          start_time i.cube
1:  3 2017-11-01 10:31:29  712 2017-11-01 10:32:29 2017-11-01 10:32:29    711
2:  3 2017-11-01 10:57:25  375 2017-11-01 10:59:25 2017-11-01 10:59:25    376
3:  3 2017-11-01 13:22:10  630 2017-11-01 13:24:12 2017-11-01 13:24:12    628
4:  7 2017-11-01 08:35:30  309 2017-11-01 08:37:35 2017-11-01 08:37:35    628
5:  7 2017-11-01 08:35:30  309 2017-11-01 09:07:44 2017-11-01 09:07:44    625
6:  7 2017-11-01 09:07:48  625 2017-11-01 09:46:16 2017-11-01 09:46:16    469

Hope it helps