I am trying to find a way around this request feature : [#2300] Add backwards and firstback to roll=TRUE which was mentioned in this post.
Basically I would like to perform the following "window-join" of X
looking up in Y
- left join on the first n columns (in the following example
{x,y}
) - AND select values of a last column (
t
in following example) inY
that falls into the[t-w1,t+w2]
interval where t is the last column inX
, typicallyt
would be a time column and{w1,w2}
some integers (likelyw1=w2=something
orw1=0
)
I built the following example (but feel free to provide another/better one)
library(data.table)
set.seed(123);
X <- data.table(x=c(1,1,1,2,2),y=c(T,T,F,F,F),t=as.POSIXct("08:00:00.000",format="%H:%M:%OS")+sample(0:999,5,TRUE)/1e3)
Y <- copy(X)
set.seed(123)
Y[,`:=`(IDX=.I,t=t+sample(c(-5:5)/1e3,5,T))]
Y <- rbindlist(list(Y, X[5,][,IDX:=6][,t:=t+0.001], X[5,][,IDX:=7][,t:=t+0.002]))
So with (w1,w2) = (.002,.002)
R) X R) Y
x y t x y t IDX
1: 1 TRUE 2013-01-25 08:00:00.286 1: 1 TRUE 2013-01-25 08:00:00.284 1
2: 1 TRUE 2013-01-25 08:00:00.788 2: 1 TRUE 2013-01-25 08:00:00.791 2
3: 1 FALSE 2013-01-25 08:00:00.407 3: 1 FALSE 2013-01-25 08:00:00.407 3
4: 2 FALSE 2013-01-25 08:00:00.882 4: 2 FALSE 2013-01-25 08:00:00.886 4
5: 2 FALSE 2013-01-25 08:00:00.940 5: 2 FALSE 2013-01-25 08:00:00.945 5
6: 2 FALSE 2013-01-25 08:00:00.941 6 #by hand
7: 2 FALSE 2013-01-25 08:00:00.942 7 #by hand
The result would be
R) ans
x y t IDX
1: 1 TRUE 2013-01-25 08:00:00.286 1
2: 1 TRUE 2013-01-25 08:00:00.788 NA
3: 1 FALSE 2013-01-25 08:00:00.407 3
4: 2 FALSE 2013-01-25 08:00:00.882 NA
5: 2 FALSE 2013-01-25 08:00:00.940 6,7
But: IDX
here could very well be a list if several rows of Y
(which can have more rows than X
) matched, one just one, or NA
if none matched.
I would be happy with some non-data.table answers too...