0
votes

I'm trying to get a column of my dataframe (ValuationDate) to reflect "quarter end" dates (12/31, 3/31, 6/30, 9/30 of the given year), based on bounding dates already provided.

So, I have an "Index" column (IDNum), a ".id" column (which counts the IDNum) and two date fields (Reporting Date and Settlement Date).

Basically, given a Report Date and a Settlement Date, i want to generate record for each quarter-end date between those two, plus the Report and Settlement Date themselves.

For example:

Report Date: 2001-12-29; Settlement Date: 2002-05-31

This should generate 4 records:

  1. 2001-12-29
  2. 2001-12-31
  3. 2002-03-31
  4. 2002-05-31

I've managed to replicate the rows the correct number of times, and can get the two "easy" dates in there (the first and the last for each record), but am struggling with the interim dates (the "What goes here" part of the code).

library(zoo)
    ClaimID_sam <- "1x1"
    ReptDat_sam <- strptime("2001-12-29", format = "%Y-%m-%d")
    SettDat_sam <- strptime("2002-05-31", format = "%Y-%m-%d")
    RecordCount <- as.integer((4*(as.yearqtr(SettDat_sam) - as.yearqtr(ReptDat_sam))) + 2)

    sam_DF <- data.frame(ClaimID_sam,ReptDat_sam,SettDat_sam,RecordCount)
    sam_DF <- as.data.frame(lapply(sam_DF,rep,RecordCount))
    sam_DF = getanID(sam_DF,"ClaimID_sam")

    sam_DF$ValDate <- ifelse(sam_DF$.id == 1, 
                                  as.Date(sam_DF$ReptDat_sam),
                                        "WHAT GOES HERE?????") 

    sam_DF$ValDate = ifelse(sam_DF$.id == sam_DF$RecordCount, 
                                  as.Date(sam_DF$SettDat_sam),
                                  sam_DF$ValDate)

EDIT @g-grothendieck 's solution is nearly perfect, though seeing some oddities, not quite hitting quarter end points?

   > do.call("rbind", by(ModData, ModData$ClaimID, add_dates))
       ClaimID Loss_Reported_Date settlementDate       ValuationDate
11X1.1    11X1         2001-12-29     2002-05-31 2001-12-29 00:00:00
11X1.2    11X1         2001-12-29     2002-05-31 2001-12-30 18:00:00
11X1.3    11X1         2001-12-29     2002-05-31 2002-03-30 18:00:00
11X1.4    11X1         2001-12-29     2002-05-31 2002-05-31 00:00:00
11X2.1    11X2         2002-04-06     2002-10-04 2002-04-06 00:00:00
11X2.2    11X2         2002-04-06     2002-10-04 2002-06-29 19:00:00
11X2.3    11X2         2002-04-06     2002-10-04 2002-09-29 19:00:00
11X2.4    11X2         2002-04-06     2002-10-04 2002-10-04 00:00:00
1
Where does the function as.yearqtr() come from? Please, provide the package. - Francesco Grossetti
from library(zoo), apologies. - Mykenk

1 Answers

1
votes

Suppose our input is a data frame with ID, st, en columns such as the one shown shown below. The st and en values for different IDs may differ.

The example below has the same start date and the same end date for each ID but the code handles the general case.

Using function make_dates, for each input ID, i.e. for each row, convert both st and en to yearqtr class, create a sequence between them, convert to end of quarter date (frac = 1 means end of quarter), include st and en in that, make sure that no element exceeds en and remove duplicates. Use group_by/group_modify to apply that to each ID (i.e. to each row) or at the end we shown how to use do.call/by as an alternative.

library(dplyr)
library(zoo)

# test input
inp <- data.frame(ID = 1:2, st = as.Date("2001-12-29"), en = as.Date("2002-05-31"))

# given dates st & en return a vector of them and intervening qtr ends
make_dates <- function(st, en) {
  st <- as.Date(st)
  en <- as.Date(en)
  yq1 <- as.yearqtr(st)
  yq2 <- as.yearqtr(en)
  dates <- as.Date(seq(yq1, yq2, 1/4), frac = 1)
  unique(pmin(c(st, dates, en), en))
}

inp %>%
  group_by(ID) %>%
  group_modify(~ cbind(., Date = make_dates(st, en))) %>%
  ungroup

giving:

# A tibble: 8 x 4
     ID st         en         Date      
  <int> <date>     <date>     <date>    
1     1 2001-12-29 2002-05-31 2001-12-29
2     1 2001-12-29 2002-05-31 2001-12-31
3     1 2001-12-29 2002-05-31 2002-03-31
4     1 2001-12-29 2002-05-31 2002-05-31
5     2 2001-12-29 2002-05-31 2001-12-29
6     2 2001-12-29 2002-05-31 2001-12-31
7     2 2001-12-29 2002-05-31 2002-03-31
8     2 2001-12-29 2002-05-31 2002-05-31

The dplyr pipeline could alternately be done without dplyr like this:

add_dates <- function(x) with(x, data.frame(ID, st, en, Date = make_dates(st, en)))
do.call("rbind", by(inp, inp$ID, add_dates))

Update

Have updated several times.