0
votes

I am trying to apply diff() on a series of columns containing dates. I am interested in the difference between date1-date2, date2-date3, etc.

I am interested in:

  1. the actual difference between the dates (days)
  2. if all dates of a row are in order (diff >= 0, by row)

I can use diff() on a series of dates (e.g. on the first row --> diff(unlist(df1[1,])) ). I just need to apply this per row, i guess using apply(), but for some reason I can't work it out. Some dates are missing, which is allowed in my study.

Hopefully this is very easy for you guys...

df <- structure(list(date1 = structure(c(-10871, -13634, -15937, -15937,
 -290, -2323), class = "Date"), date2 = structure(c(16678, NA,16037, 16659, 
16538, 16626), class = "Date"), date3 = structure(c(16685,16688, NA, 16659,
 16568, 16672), class = "Date"), date4 = structure(c(16701, 16695, 16670,
 16661, 16582, 16672), class = "Date"), date5 = structure(c(16709, 16695, 
16661, 16667, 16619, 16692), class = "Date")), .Names = c("date1","date2", 
"date3", "date4", "date5"), row.names = c("2", "3", "4", "5", "6", "7"), 
class = "data.frame")
df
2
apply converts everything to character, causing diff to fail. - thelatemail
For 1, if you mean down your columns (the usual way to use diff), you just need sapply(df, diff). - alistaire
If you mean by rows for 1, t(apply(df, 1, function(x){diff(as.Date(x))})), though you'll lose your column names. Equivalent, but a little uglier, and keeps column names: t(sapply(1:nrow(df), function(x){diff(unlist(df[x,]))})) - alistaire

2 Answers

1
votes

You can try something like this:

apply(df, 1, function(x) identical(sort(as.Date(x)), as.Date(x[!is.na(x)])))

It is providing output as this, which says whether the particular rows dates are in sorted order.

    2     3     4     5     6     7 
 TRUE  TRUE FALSE  TRUE  TRUE  TRUE 
1
votes

This will be simpler and quicker to process in long form I reckon:

dflong <- transform(
  stack(lapply(df, as.numeric)),
  date   = as.Date(values,origin="1970-01-01"),
  group  = seq_len(nrow(df)),
  ind    = NULL,
  values = NULL
)

dflong <- dflong[order(dflong$group),]

dflong$daysdiff <- with(dflong,
  ave(as.numeric(date), group, FUN=function(x) c(NA,diff(x)) ) 
)

#         date group daysdiff
#1  1940-03-28     1       NA
#7  2015-08-31     1    27549
#13 2015-09-07     1        7
#19 2015-09-23     1       16
#25 2015-10-01     1        8
#2  1932-09-03     2       NA
#8        <NA>     2       NA
#14 2015-09-10     2       NA

aggregate(daysdiff ~ group, data=dflong, function(x) any(x < 0, na.rm=TRUE) )

#  group daysdiff
#1     1    FALSE
#2     2    FALSE
#3     3     TRUE
#4     4    FALSE
#5     5    FALSE
#6     6    FALSE