I have three columns in my data set:
- The date when Item X was purchased
- The date when Item Y was purchased
- The date when Item Z was purchased
Sample Data:
set.seed(100)
Y <- sample(seq(as.Date('2016-01-01'), as.Date('2017-09-30'), by="day"), 100)
Y1 <- sample(seq(as.Date('2016-01-01'), as.Date('2017-03-31'), by="day"), 100)
Y2 <- sample(seq(as.Date('2016-01-01'), as.Date('2017-03-31'), by="day"), 100)
X1 <- sample(1:183, 100, replace = T)
Z1 <- sample(1:183, 100, replace = T)
X <- Y1 - X1
Z <- Y2 + Z1
dat <- data.frame("X"=X, "Y"=Y, "Z"=Z)
What I want to do is to create a dataset such that:
- If Y is purchased, look back within 6 months and find all those rows with most recent dates where X were purchased. Also, look forward from that date of Y purchase and find all the most recent dates when Z were purchased.
- If any of the dates are not available within 6 months range, select as NA
So that sample output would look like:
2017-04-10 2017-05-08 2017-06-19
2017-01-01 2017-05-08 2017-09-30
.
.
.
2017-04-10 2017-06-10 2016-06-19
NA 2017-05-08 2017-09-30
.
.
.
Here, most recent date means (for example): If bought Y on date June 30, 2017 then from June 30 what was the most recent date when X was bought (lets say it was May 15, 2017) and the most recent date when Z was bought (lets sat it was July 21, 2017). So for X it should be look backward within 6 months max and for Z, ti should be Look Forward into 6 months max.
Logic is easy, but I would prefer a way out in r-SQL or dplyr, if possible.
most recent dates
..._which_ most recent dates? Also, this might be a very ugly query in MySQL, which doesn't support analytic functions. – Tim Biegeleisenif bought Y on date June 30, 2017 then from June 30 what was the most recent date when X was bought (lets say it was May 15, 2017) and the most recent date when Z was bought (lets sat it was July 21, 2017). So for X it should be look backward within 6 months max and for Z, ti should be Look Forward into 6 months max.
– I_m_LeMarque