1
votes

I have an input dataframe with StartDate and EndDate, formatted as dates: input_df: C1 C2 StartDate EndDate A B 9/5/2019 12/14/2019 C D 4/12/2019 5/14/2019 E F 12/5/2019 12/15/2019 I am trying to achieve the following output based on some conditions: - If sys.date() is less than or equal to the EndDate, then I want to retain that row and add another row with Year+1 - If sys.Date() is greater than the EndDate, then replace 2019 from the year to 2020

The desired output is: output_df: C1 C2 StartDate EndDate A B 9/5/2019 12/14/2019 A B 9/5/2020 12/14/2020 C D 4/12/2020 5/14/2020 E F 12/5/2019 12/15/2019 E F 12/5/2020 12/15/2020 I have explored separate_rows and lubridate but not sure how to incorporate an if condition with those functions. The dataframe is big and I am trying to avoid for loops to do this?

1

1 Answers

1
votes

One option is to use case_when to increase one year on 'StartDate' , 'EndDate' columns and then bind with the original dataset

library(dplyr)
library(lubridate)
input_df %>%
    mutate_at(3:4, ~ mdy(.) %m+% years(1)) %>%
    bind_rows(input_df %>%
             mutate_at(3:4, mdy)) %>% 
    arrange_all() %>% 
    group_by(C1, C2) %>% 
    slice(if(first(EndDate) <= Sys.Date()) n() else row_number())
# A tibble: 5 x 4
# Groups:   C1, C2 [3]
#  C1    C2    StartDate  EndDate   
#  <chr> <chr> <date>     <date>    
#1 A     B     2019-09-05 2019-12-14
#2 A     B     2020-09-05 2020-12-14
#3 C     D     2020-04-12 2020-05-14
#4 E     F     2019-12-05 2019-12-15
#5 E     F     2020-12-05 2020-12-15

Or another option is to uncount based on the condition to expand the rows, then replace the last row by increasing one year

library(tidyr)
input_df %>% 
   mutate_at(3:4, mdy) %>%
   mutate(n = 1 + (Sys.Date() <= EndDate)) %>% 
   uncount(n) %>% 
   group_by(C1, C2) %>% 
   mutate_at(vars(-group_cols()), ~ replace(., n(), .[n()] + years(1))) 
# A tibble: 5 x 4
# Groups:   C1, C2 [3]
#  C1    C2    StartDate  EndDate   
#  <chr> <chr> <date>     <date>    
#1 A     B     2019-09-05 2019-12-14
#2 A     B     2020-09-05 2020-12-14
#3 C     D     2020-04-12 2020-05-14
#4 E     F     2019-12-05 2019-12-15
#5 E     F     2020-12-05 2020-12-15

Or using base R

nm1 <- c('StartDate', 'EndDate')
input_df[nm1] <- lapply(input_df[nm1], as.Date, format = "%m/%d/%Y")
i1 <- Sys.Date() <= input_df$EndDate
lst1 <- lapply(input_df[i1, nm1], function(date) 
   do.call(c, lapply(date, seq, length.out = 2, by = '1 year')))
input_df2 <- input_df[rep(seq_len(nrow(input_df)), i1 + 1),]
input_df2[rep(i1, i1 +1), nm1] <- lst1

data

input_df <- structure(list(C1 = c("A", "C", "E"), C2 = c("B", "D", "F"), 
    StartDate = c("9/5/2019", "4/12/2019", "12/5/2019"), EndDate = c("12/14/2019", 
    "5/14/2019", "12/15/2019")), class = "data.frame", row.names = c(NA, 
-3L))