I have two data frames of different lengths. Both data frames contain dates. I would like to inner join the closest date column without going over. For example, if "2019-01-05" is the date and the two choices are "2019-01-06" and "2019-01-02", it joins with "2019-01-02". I tried using roll "inf" from data.table package but I am not getting the desired result. Here are 2 toy data frames that I want to join by date.
I would also like to group the join by the ticker column as well. In tidyverse, the code would be similar to:
df1 %>% group_by(ticker) %>% inner_join(df2, by = "Date") #but choose the closest date without going over.
df1 = data.frame(ticker = c("AAPL", "AAPL", "MSFT", "MSFT"), date = c("2019-01-06", "2019-02-06", "2019-01-06", "2019-05-02"))
df2 = data.frame(ticker = c("AAPL", "AAPL", "AAPL", "MSFT", "MSFT", "MSFT"), date = c("2019-01-03", "2019-01-07" , "2019-02-06", "2019-01-05", "2019-01-07", "2019-05-02"), randomVar = rnorm(6))
print(df1)
ticker date
1 AAPL 2019-01-06
2 AAPL 2019-02-06
3 MSFT 2019-01-06
4 MSFT 2019-05-02
print(df2)
ticker date randomVar
1 AAPL 2019-01-03 -0.5321493
2 AAPL 2019-01-07 -0.7909461
3 AAPL 2019-02-06 0.2121993
4 MSFT 2019-01-05 1.2336315
5 MSFT 2019-01-07 -0.2729354
6 MSFT 2019-05-02 -0.5349596
I would like to create df3 which is a join between df1 and df2. The date key column should only be the ones from the df1 date column.
In this case, our new df3 should look exactly like this.
print(df3)
ticker date randomVar
1 AAPL 2019-01-06 -0.5321493
2 AAPL 2019-02-06 0.2121993
3 MSFT 2019-01-06 1.2336315
4 MSFT 2019-05-02 -0.5349596
data.tablepackage. Tutorial here. - eipi10fuzzyjoincould also help this one I think - link here - Sinh Nguyen