0
votes

I have salesorder data in the following (sample) format:

salesorder <- c('TM001', 'TM002', 'TM003', 'TM004')
esttxndate <- as.Date(c('2018-10-01', '2018-10-01', '2018-10-04', '2018-10-06'))
potxndate <- as.Date(c('2018-10-07', '2018-10-06', '2018-10-14', '2018-10-18'))
intxndate <- as.Date(c('2018-11-06', '2018-11-05', '2018-11-13', '2018-11-17'))
salesorder <- data.frame(salesorder, esttxndate, potxndate, intxndate)
 salesorder esttxndate  potxndate   intxndate
1   TM001   2018-10-01  2018-10-07  2018-11-06  
2   TM002   2018-10-01  2018-10-06  2018-11-05  
3   TM003   2018-10-04  2018-10-14  2018-11-13  
4   TM004   2018-10-06  2018-10-18  2018-11-17

I am trying to create a new dataframe which looks at the dates of each salesorder and outputs the status on each date:

     date       TM001   TM002   TM003   TM004
01 2018-10-01    est     est     dne      dne
02 2018-10-02    est     est     dne      dne
.
07 2018-10-07    pro     pro     est      est
.
32 2018-11-01    pro     pro     pro      pro
.
37 2018-11-06    inv     inv     pro      pro
.
48 2018-11-17    inv     inv     inv      inv

I was able to get the list of dates out using the min and max functions (saved as mindate & maxdate). I then started a new data.frame with the values from the date range as:

mindate <- min(esttxndate, potxndate, intxndate)
maxdate <- max(esttxndate, potxndate, intxndate)
dates <- data.frame(as.Date(as.Date(mindate):as.Date(maxdate), origin="1970-01-01"))
names(dates)[1] <- "date"

I am at a loss for what to do next as I have tried to utilize user-defined functions and applying across rows on both the newly created dates dataframe and on the previous salesorder dataframe.

I am coming from a background in Stata and was able to produce the desired dataset by first going through and saving temp values for each date (ex. local variable potxndate_TM001 = 2018-10-07)

ds *date
foreach dt in `r(varlist)' {
    forval i = 1/`=_N' {
        local so = salesorder[`i']
        local `dt'_`so' = `dt'[`i']
    }
}

Once all the dates are saved as local variables I dropped all the variables besides salesorder, transposed the table and created a new variable date ranging from the minimum date to the maximum date. I then ran the following to get the values based on the date column and the locally saved variables.

ds TM*
foreach so in `r(varlist)' {
    forval i = 1/`=_N' {
        if `intxndate_`so'' <= date[`i'] {
            replace `so' = "inv" in `i'
        }
        else if `potxndate_`so'' <= date[`i'] {
            replace `so' = "pro" in `i'
        }
        else if `esttxndate_`so'' <= date[`i'] {
            replace `so' = "est" in `i'
        }
        else if `esttxndate_`so'' > date[`i'] {
            replace `so' = "dne" in `i'
        }
    }
}

I believe there is a way to do this in R without creating the intermediate local variables / modifying original dataset, which should be much more efficient and faster (?).

2
Hello, can you explain what esttxndate, potxndate, and intxndate mean and how I can derive the status from them? The int_overlaps function (and other functions) from lubridate might be useful here.MartijnVanAttekum
@MartijnVanAttekum if the date is prior to esttxndate, the salesorder "dne" (does not exsist) if the date is on or after the esttxndate but before the potxndate, the sales order is "est" or being estimated if the date is on or after the potxndate but before the intxndate, the sales order is "pro" or in production if the date is on or after the intxndate, the sales orer is "inv" or being invoicedCarl Fixsen
The Stata code could be rewritten, I suspect, to cut down on most of the looping. If that interests you, post a new question but follow the Stata tag wiki to give example data in readable form.Nick Cox

2 Answers

0
votes

Loops in R tend to be slow, a faster solution is to use functional programming tools such as those in the purrr package or the function apply rather than loops as you have used in stata.

To solve this, I have written my own function, most_recent_txn which returns the status of a given sales order on a given date, then applied this function to all dates in the vector dates$date, using purrr::map_chr().

Then, to do this for all sales orders (rows in the original dataframe, salesorder), I have written a function which carries this out for a given row and applied this to all rows using the apply function.


most_recent_txn <- function(as_of_date, order_dates) {
  # return the column name of the last txn step compleated, as of the date given.
  last_step = "dne"

  # if there is any recorded activity at that point, we assign the most recent
  # activity to last step
  if(max(which(t(order_dates)<=as_of_date))>0){
    last_step = names(order_dates)[max(which(t(order_dates)<=as_of_date))]
  }
  return(last_step)
}


progress_of_sales_order <- function(order) {
   purrr::map_chr(dates$date,most_recent_txn,order_dates=order[2:4])
}

status = cbind(dates$date,
          apply(salesorder,1,FUN=progress_of_sales_order))

Not the most elegant solution, it relies on using the column order of salesorder dataframe to indicate the step in the process and implicitly assumes that the status of a sales order cannot go backward (eg. purchase order after invoice).

0
votes

Created a function called status

status <- function(tstdate, estdate, podate, invdate) {
   ifelse (tstdate >= invdate, "inv",
      ifelse (tstdate >= podate, "pro",
         ifelse (tstdate >= estdate, "est", "dne")))
}

I was then able to run the following across the dates:

final <- data.frame(apply(dates,c(1,2),function(x) {
   status(x, salesorder$esttxndate, salesorder$potxndate, salesorder$intxndate)
}

The rest is formatting to get the data frame to my liking.

Minor note: this solution will have rows & columns reversed from how the question proposes for desired results. The issue with this way is not being able to name the columns dates, as numbers cannot be used in names.