Join Two Data Frames By Closest Date Without Going Over In R

Question

I have two data frames of different lengths. Both data frames contain dates. I would like to inner join the closest date column without going over. For example, if "2019-01-05" is the date and the two choices are "2019-01-06" and "2019-01-02", it joins with "2019-01-02". I tried using roll "inf" from data.table package but I am not getting the desired result. Here are 2 toy data frames that I want to join by date.

I would also like to group the join by the ticker column as well. In tidyverse, the code would be similar to:

df1 %>% group_by(ticker) %>% inner_join(df2, by = "Date") #but choose the closest date without going over.


df1 = data.frame(ticker = c("AAPL", "AAPL", "MSFT", "MSFT"), date = c("2019-01-06", "2019-02-06", "2019-01-06", "2019-05-02"))

df2 = data.frame(ticker = c("AAPL", "AAPL", "AAPL", "MSFT", "MSFT", "MSFT"), date = c("2019-01-03", "2019-01-07" , "2019-02-06", "2019-01-05", "2019-01-07", "2019-05-02"), randomVar = rnorm(6))

print(df1)
 ticker       date
1   AAPL 2019-01-06
2   AAPL 2019-02-06
3   MSFT 2019-01-06
4   MSFT 2019-05-02


print(df2)
 ticker       date  randomVar
1   AAPL 2019-01-03 -0.5321493
2   AAPL 2019-01-07 -0.7909461
3   AAPL 2019-02-06  0.2121993
4   MSFT 2019-01-05  1.2336315
5   MSFT 2019-01-07 -0.2729354
6   MSFT 2019-05-02 -0.5349596

I would like to create df3 which is a join between df1 and df2. The date key column should only be the ones from the df1 date column.

In this case, our new df3 should look exactly like this.

 print(df3)

 ticker       date   randomVar
1   AAPL 2019-01-06  -0.5321493
2   AAPL 2019-02-06   0.2121993
3   MSFT 2019-01-06   1.2336315
4   MSFT 2019-05-02   -0.5349596

See "rolling joins" in the data.table package. Tutorial here. — eipi10

Wimpel Wimpel · Accepted Answer · 2019-10-12T08:11:06

There is already an accepted answer, but since the data.table tag is present, here is a data.table solution:

#convert sample data to data.table
setDT(df1)
setDT(df2)
#convert dates to 'real' dates
df1[, date := as.IDate(date) ]
df2[, date := as.IDate(date) ]
#update df1 by reference with a rolling join
df1[, randomVar := df2[ df1, x.randomVar, on = .(ticker, date), roll = Inf ] ]

#    ticker       date  randomVar
# 1:   AAPL 2019-01-06 -0.5321493
# 2:   AAPL 2019-02-06  0.2121993
# 3:   MSFT 2019-01-06  1.2336315
# 4:   MSFT 2019-05-02 -0.5349596

Join Two Data Frames By Closest Date Without Going Over In R

2 Answers

Note