1
votes

I would like to increment the year in a Dates column based on a condition - If the StartDate is later than the EndDate, then increment the year in EndDate, For example:

Data:

input_df <- structure(list(C1 = c("A", "C", "E", "G", "I"), C2 = c("B", "D", "F", "H", "J"), 
                           StartDate = c("12/23/2019", "12/24/2019", "12/28/2019", "01/01/2019", "05/15/2019"), EndDate = c("01/07/2019", "12/25/2019", "12/31/2019", "04/11/2019", "05/18/2019")), class = "data.frame", row.names = c(NA, -5L))

input_df:
  C1 C2  StartDate    EndDate
1  A  B 12/23/2019 01/07/2019
2  C  D 12/24/2019 12/30/2019
3  E  F 12/28/2019 12/30/2019
4  G  H 01/01/2019 04/18/2019
5  I  J 05/15/2019 05/25/2019

Expected output: input_df:

  C1 C2  StartDate    EndDate
1  A  B 12/23/2019 01/07/2020
2  C  D 12/24/2019 12/25/2019
3  E  F 12/28/2019 12/31/2019
4  G  H 01/01/2019 04/11/2019
5  I  J 05/15/2019 05/18/2019

I tried to do the following to achieve this:

library(lubridate)
input_df$EndDate[input_df$EndDate < input_df$StartDate] <- mdy(input_df$EndDate) + years(1)

But the output I receive is:

  C1 C2  StartDate    EndDate
1  A  B 12/23/2019      18268
2  C  D 12/24/2019 12/25/2019
3  E  F 12/28/2019 12/31/2019
4  G  H 01/01/2019 04/11/2019
5  I  J 05/15/2019 05/18/2019

with a warning 'number of items to replace is not a multiple of replacement length'

3

3 Answers

4
votes

I think you are getting the error because you only subset the left side. Also, you probably want to work with the same class for both StartDate and EndDate. Try this:

input_df$StartDate <- mdy(input_df$StartDate)
input_df$EndDate <- mdy(input_df$EndDate)

input_df$EndDate[input_df$EndDate < input_df$StartDate] <- 
  input_df$EndDate[input_df$EndDate < input_df$StartDate] + years(1)

input_df
2
votes

Using tidyverse package:

input_df %>% mutate(StartDate = as.Date(StartDate, format = "%m/%d/%Y"), 
EndDate = as.Date(EndDate, format = "%m/%d/%Y"), 
EndDate_N = if_else(StartDate >= EndDate, EndDate + years(1), EndDate))
0
votes

We can also do this with base R

#Convert Start and End date to `POSIXlt` format
input_df[c('StartDate','EndDate')] <- lapply(input_df[c('StartDate','EndDate')], 
                                       as.POSIXlt, format = "%m/%d/%Y")
#Get the row index where StartDate > EndDate
inds <- input_df$StartDate > input_df$EndDate
#Increment the year for those indexes
input_df$EndDate[inds]$year <- input_df$EndDate[inds]$year + 1

input_df
#  C1 C2  StartDate    EndDate
#1  A  B 2019-12-23 2020-01-07
#2  C  D 2019-12-24 2019-12-25
#3  E  F 2019-12-28 2019-12-31
#4  G  H 2019-01-01 2019-04-11
#5  I  J 2019-05-15 2019-05-18