1
votes

I converted two time variables "Interaction2" and "Start2" to a week format so that I could aggregate my dataset by weeks. I want to create a third variable "Weeks" that is the difference between "Interaction2" and "Start2". I used the following command to convert the time variables into a standard date format of year, month, week (instead of year, month, day):

d1$Interaction2<-format(d1$Interaction,'%Y-%m-%U')
d1$Start2<-format(d1$Start,'%Y-%m-%U')

The result for "Interaction2" and "Start2" appeared to be formatted correctly but they are character variabes. I used the difftime function to obtain the difference but the result is a decimal:

d1$Weeks<-difftime(d1$Interaction2,d1$Start2,units='weeks')

Shouldn't the result be an integer? Is the difftime command interpreting the last two digits as a day instead of a week? How can I obtain the difference as a count of weeks between the "Interaction2" week and the "Start2" week?

structure(list(Interaction2 = c("2015-02-06", "2015-02-08", "2015-03-09", 
"2015-03-11", "2015-03-12"), Start2 = c("1995-04-16", "1995-04-16", 
"1995-04-16", "1995-04-16", "1995-04-16"), Weeks = structure(c(1033.72023809524, 
1034.00595238095, 1038.14285714286, 1038.42857142857, 1038.57142857143
), units = "weeks", class = "difftime")), .Names = c("Interaction2", 
"Start2", "Weeks"), row.names = c(NA, 5L), class = "data.frame")

I also tried to convert the character variables using strptime before running the difference command:

d1$Interaction3<-strptime(as.character(d1$Interaction2),"%Y%m%U")
d1$Start3<-strptime(as.character(d1$Start2),"%Y%m%U")
d1$Weeks<-difftime(d1$Interaction3,d1$Start3,units='weeks')

But this resulted in NA's for the "Interaction3", "Start3" and "Weeks" variables:

structure(list(Interaction2 = c("2015-02-06", "2015-02-08", "2015-03-09", 
"2015-03-11", "2015-03-12"), Start2 = c("1995-04-16", "1995-04-16", 
"1995-04-16", "1995-04-16", "1995-04-16"), Weeks = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), units = "weeks", class = "difftime"), 
Start3 = structure(list(sec = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), min = c(NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_), hour = c(NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_), mday = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_), mon = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_), year = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_), wday = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_), yday = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_), isdst = c(-1L, 
-1L, -1L, -1L, -1L), zone = c("", "", "", "", ""), gmtoff = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_)), .Names = c("sec", 
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst", 
"zone", "gmtoff"), class = c("POSIXlt", "POSIXt")), Interaction3 = structure(list(
sec = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), min = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_), hour = c(NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_), mday = c(NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_), mon = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_), 
year = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_), wday = c(NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_), yday = c(NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_), isdst = c(-1L, 
-1L, -1L, -1L, -1L), zone = c("", "", "", "", ""), gmtoff = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_)), .Names = c("sec", 
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst", 
"zone", "gmtoff"), class = c("POSIXlt", "POSIXt"))), .Names = c("Interaction2", 
"Start2", "Weeks", "Start3", "Interaction3"), row.names = c(NA, 
5L), class = "data.frame")
1
Give the lubridate package a try. - Dean MacGregor
If you want the integer part of the result , you need just to round the result. floor(difftime(as.Date(d1$Interaction2),as.Date(d1$Start2),units='weeks')) - agstudy
Thank you, and yes the floor command appears to work by rounding down. But what I am wondering is if there should be a decimal result if difftime specifies units=weeks if the dates were already in '%Y-%m-%U' (week) format? I understand a decimal if the dates were in '%Y-%m-%d' format. - user3594490
format(d1$…,'%Y-%m-%U') yields the error invalid 'trim' argument. What do you mean: format(d1$…, format='%Y-%m-%U'), or format.Date(d1$…,'%Y-%m-%U')? - These two yield different results. - Armali

1 Answers

1
votes

Try this (adding the units parameter) which depends on difftime implicitly coercing a character variable to a numeric one (POSIXct) and taking the numerical difference:

> difftime( mydf$Interaction2,mydf$Start2, units="weeks")
Time differences in weeks
[1] 1033.720 1034.006 1038.143 1038.429 1038.571