I am trying to join two data frames; by ID, using the data.table package in R, as well as the lubridate package for dates. I only want to join rows with a Date2 that is 3 months prior or after the Date1
df1
| ID | Date1 |
|---|---|
| 1 | 2019-09-09 |
| 2 | 2019-09-09 |
| 3 | 2019-09-09 |
| 4 | 2019-09-09 |
Df2
| Id | Date 2 | Value |
|---|---|---|
| 1 | 2020-09-09 | 6 |
| 1 | 2019-10-09 | 7 |
| 2 | 2019-03-13 | 8 |
| 4 | 2019-10-27 | 15 |
Final data frame that I want
| ID | Date1 | Id | Date 2 | Value |
|---|---|---|---|---|
| 1 | 2019-09-09 | 1 | 2020-09-09 | 6 |
| 2 | 2019-09-09 | 2 | NA | NA |
| 3 | 2019-09-09 | NA | NA | NA |
| 4 | 2019-09-09 | 4 | 2019-10-27 | 15 |
I have included my code below. I am getting error saying the = after roll is not needed, but I think it is needed.
library(data.table)
library(lubridate)
df1 <- data.table(ID = seq_len(4L),
Date1 = as.Date("2019-09-09"))
df2 <- data.table(Id = c(1L, 1L, 2L, 4L),
Date2 = as.Date(c("2020-09-09","2019-10-09","2019-03-13","2019-10-27")),
Value = c(6L, 7L, 8L, 15L))
df2[, Date1Copy := Date2]
df2[, IDcopy := Id]
setkey(df2, ID, Date2) ## set the column to perform the join on
setkey(df, ID, Date1)
test = df1[df2, rollends = if (roll=="nearest") c(TRUE,TRUE)
else if (roll = as.Date(Date1)%m+% months(3)|roll = as.Date(Date1)%m-% months(3)) c(FALSE,TRUE)
else c(TRUE,FALSE)]```
Id=2? It seems like if you were only joining rows that fell within the time window,Idfor that record would beNA- Matt SummersgillID(any simpler than the method I proposed). Do you see a way to implement both the “nearest” selection and cross-table Date comparison in a single join statement? - Matt Summersgill