Average days between 2 dates columns R

Question

I have 2 dates columns in a dataframe with more than 100k observations

date1	startdate
2020-07-30 23:00:00	NA
2020-12-10 04:00:00	2021-06-30 20:00:00
2020-10-26 21:00:00	NA
2019-12-03 03:01:00	2020-02-01 01:00:00
NA	2020-06-28 07:30:00

I have to fill the missing values in startdate column, so my idea is to compute the average of days between date1 and startdate and to replace the NA in startdate after by doing an addition between this average and the date1 date.

DESIRED OUTPUT

For instance, if the average of days is 70, then :

date1	startdate
2020-07-30 23:00:00	2020-10-08 23:00:00
2020-12-10 04:00:00	2021-06-30 20:00:00
2020-10-26 21:00:00	2021-01-04 21:00:00
2019-12-03 03:01:00	2020-02-01 01:00:00
NA	2020-06-28 07:30:00

Reproducible example :

structure(list(date1 = structure(c(1594069500, 1575320400, 1603742400, NA, 1574975100, 1570845660, 1575061500, 1564714860, 1576544400, 1574802300, 1576198800, 1575338460, 1575666180, NA, 1594327800, 1595365200, 1594069800, 1591905600, 1594414800, NA), class = c("POSIXct", "POSIXt"), tzone = ""), startdate = structure(c(1599242400, 1577127600, NA, 1603396800, 1577516400, 1573714800, 1577689200, 1566374400, 1577343600, 1577516400, 1577343600, NA, 1577257200, NA, 1605193200, 1605106800, 1600358400, 1600358400, 1600272000, NA), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(1L, 2L, 7591L, 8301L, 8692L, 8694L, 8699L, 8703L, 8706L, 8709L, 8710L, 8714L, 8715L, 8730L, 8732L, 8733L, 8736L, 8740L, 8745L, 8749L ), class = "data.frame")

Ronak Shah Ronak Shah · Accepted Answer · 2021-07-13T14:22:30

You can use difftime to calculate average duration between startdate and date1. Replace NA values in startdate by adding the average value to date1.

avg <- as.numeric(mean(difftime(df$startdate, df$date1, units = 'secs'), na.rm = TRUE))
df$startdate[is.na(df$startdate)] <- df$date1[is.na(df$startdate)] + avg
df

Average days between 2 dates columns R

2 Answers