54
votes

I have a data frame which has two columns of dates in the format yyyy/mm/dd. I am trying to calculate the number of days between these two dates for each observation within the data frame (and create a new variable with this number of days in it).

So far I have tried using the answer given here:

Calculate the number of weekdays between 2 dates in R

but editing the code so that it calculates number of total days rather than just the number of week days. This just came up with error saying:

Error in del/by : non-numeric argument to binary operator
In addition: Warning message:
In Ops.factor(to, from) : - not meaningful for factors

I have also attempted using this code:

finish <- as.Date(survey$date, format="%yyyy/%mm/%dd")
start <- as.Date(survey$tx_start, format="%yyyy/%mm/%dd")
date_diff<-as.data.frame(finish-start)

with the plan of using "cbind" or something equivalent to combine the data frame "days" to my data frame "survey" which has data in it.

Although this does not give me any errors, the observations within the "finish" and "start" objects are all "NA_real_", and the date_diff data frame therefore has all the observations listed as NA.

If someone could point me in the right direction that would be great! All the other question I have found do not seem to be dealing with dates within variables but as individual dates, and applying those techniques to variables hasn`t been working for me.

5
This code works for me. Not sure if it's suitable for your example: d1 = '2012-07-26'; d2 = '1970-12-25'; dd1 = as.Date(d1); dd2 = as.Date(d2); dd1-dd2;Pete855217
My main issue is dealing with columns of dates though. I have 2 columns of dates, for which I want to calculate the number of days between these two as a third column. Your code works, but is there a way to adapt it so that it deals with columns of dates within a data frame?Timothy Alston
Referencing d1 and d2 as matrix elements eg. survey$date should work.Pete855217

5 Answers

68
votes

Without your seeing your data (you can use the output of dput(head(survey)) to show us) this is a shot in the dark:

survey <- data.frame(date=c("2012/07/26","2012/07/25"),tx_start=c("2012/01/01","2012/01/01"))

survey$date_diff <- as.Date(as.character(survey$date), format="%Y/%m/%d")-
                  as.Date(as.character(survey$tx_start), format="%Y/%m/%d")
survey
       date   tx_start date_diff
1 2012/07/26 2012/01/01  207 days
2 2012/07/25 2012/01/01  206 days
55
votes

You could find the difference between dates in columns in a data frame by using the function difftime as follows:

df$diff_in_days<- difftime(df$datevar1 ,df$datevar2 , units = c("days"))
2
votes

You need to use the as.Date formats correctly.

Eg.

x = '2012/07/25'
xd = as.Date(x,'%Y/%m/%d')
xd    # Prints "2012-07-25"

R date formats are similary to *nix ones.

Doing a typeof(xd) shows it as a double ie. days since 1970.

1
votes

In Ronald's example, if the date formats are different (as displayed below) then modify the format parameter

survey <- data.frame(date=c("2012-07-26","2012-07-25"),tx_start=c("2012-01-01","2012-01-01"))

survey$date_diff <- as.Date(as.character(survey$date), format="%Y-%m-%d")-
              as.Date(as.character(survey$tx_start), format="%Y-%m-%d")

survey:

   date      tx_start     date_diff
1 2012-07-26     2012-01-01    207 days
2 2012-07-25     2012-01-01    206 days
0
votes

Following Ronald Example I would like to add that it should be considered if the origin and end dates must be included or not in the days count between two dates. I faced the same problem and ended up using a third option with apply. It could be memory inefficient but helps to understand the problem:

   survey <- data.frame(date=c("2012/07/26","2012/07/25"),tx_start=c("2012/01/01","2012/01/01"))

survey$diff_1 <- as.numeric(
  as.Date(as.character(survey$date), format="%Y/%m/%d")-
    as.Date(as.character(survey$tx_start), format="%Y/%m/%d")
)

survey$diff_2<- as.numeric(
  difftime(survey$date ,survey$tx_start , units = c("days"))
)

survey$diff_3 <- apply(X = survey[,c("date", "tx_start")],
                       MARGIN = 1,
                       FUN = function(x)
                         length(
                           seq.Date(
                             from = as.Date(x[2]),
                             to = as.Date(x[1]),
                             by = "day")
                           )
                       )

This gives the following date differences:

        date   tx_start diff_1   diff_2 diff_3
1 2012/07/26 2012/01/01    207 206.9583    208
2 2012/07/25 2012/01/01    206 205.9583    207