I have salesorder data in the following (sample) format:
salesorder <- c('TM001', 'TM002', 'TM003', 'TM004')
esttxndate <- as.Date(c('2018-10-01', '2018-10-01', '2018-10-04', '2018-10-06'))
potxndate <- as.Date(c('2018-10-07', '2018-10-06', '2018-10-14', '2018-10-18'))
intxndate <- as.Date(c('2018-11-06', '2018-11-05', '2018-11-13', '2018-11-17'))
salesorder <- data.frame(salesorder, esttxndate, potxndate, intxndate)
salesorder esttxndate potxndate intxndate
1 TM001 2018-10-01 2018-10-07 2018-11-06
2 TM002 2018-10-01 2018-10-06 2018-11-05
3 TM003 2018-10-04 2018-10-14 2018-11-13
4 TM004 2018-10-06 2018-10-18 2018-11-17
I am trying to create a new dataframe which looks at the dates of each salesorder and outputs the status on each date:
date TM001 TM002 TM003 TM004
01 2018-10-01 est est dne dne
02 2018-10-02 est est dne dne
.
07 2018-10-07 pro pro est est
.
32 2018-11-01 pro pro pro pro
.
37 2018-11-06 inv inv pro pro
.
48 2018-11-17 inv inv inv inv
I was able to get the list of dates out using the min and max functions (saved as mindate & maxdate). I then started a new data.frame with the values from the date range as:
mindate <- min(esttxndate, potxndate, intxndate)
maxdate <- max(esttxndate, potxndate, intxndate)
dates <- data.frame(as.Date(as.Date(mindate):as.Date(maxdate), origin="1970-01-01"))
names(dates)[1] <- "date"
I am at a loss for what to do next as I have tried to utilize user-defined functions and applying across rows on both the newly created dates dataframe and on the previous salesorder dataframe.
I am coming from a background in Stata and was able to produce the desired dataset by first going through and saving temp values for each date (ex. local variable potxndate_TM001 = 2018-10-07)
ds *date
foreach dt in `r(varlist)' {
forval i = 1/`=_N' {
local so = salesorder[`i']
local `dt'_`so' = `dt'[`i']
}
}
Once all the dates are saved as local variables I dropped all the variables besides salesorder, transposed the table and created a new variable date ranging from the minimum date to the maximum date. I then ran the following to get the values based on the date column and the locally saved variables.
ds TM*
foreach so in `r(varlist)' {
forval i = 1/`=_N' {
if `intxndate_`so'' <= date[`i'] {
replace `so' = "inv" in `i'
}
else if `potxndate_`so'' <= date[`i'] {
replace `so' = "pro" in `i'
}
else if `esttxndate_`so'' <= date[`i'] {
replace `so' = "est" in `i'
}
else if `esttxndate_`so'' > date[`i'] {
replace `so' = "dne" in `i'
}
}
}
I believe there is a way to do this in R without creating the intermediate local variables / modifying original dataset, which should be much more efficient and faster (?).
int_overlaps
function (and other functions) from lubridate might be useful here. – MartijnVanAttekum