3
votes

I have a data frame that has 4 columns of dates. It should be that col1 occurs first, col2 occurs second, col3 third, and col4 last. Id like to identify which rows have dates that are not in sequence

Here is a toy data frame

col1 <- c(as.Date("2004-1-1"), as.Date("2005-1-1"), as.Date("2006-1-1"))
col2 <- c(as.Date("2004-1-2"), as.Date("2005-1-3"), as.Date("2006-1-2"))
col3 <- c(as.Date("2004-1-5"), as.Date("2005-1-9"), as.Date("2006-1-19"))
col4 <- c(as.Date("2004-1-9"), as.Date("2005-1-15"), as.Date("2006-1-10"))
dates <- data.frame(col1, col2, col3, col4)

dates

    col1       col2       col3       col4
1 2004-01-01 2004-01-02 2004-01-05 2004-01-09
2 2005-01-01 2005-01-03 2005-01-09 2005-01-15
3 2006-01-01 2006-01-02 2006-01-19 2006-01-10

My desired output would be,

    col1       col2       col3       col4       Seq?
1 2004-01-01 2004-01-02 2004-01-05 2004-01-09    T
2 2005-01-01 2005-01-03 2005-01-09 2005-01-15    T
3 2006-01-01 2006-01-02 2006-01-19 2006-01-10    F
3

3 Answers

6
votes

I can think of a couple of solutions. Naively i'd suggest using apply with ?is.unsorted, which is:

Test if an object is not sorted (in increasing order), without the cost of sorting it.

!apply(dates, 1, is.unsorted)
#[1]  TRUE  TRUE FALSE

Otherwise, convert to a long set and then do a group operation, which should be faster on larger datasets:

tmp <- cbind(row=seq_len(nrow(dates)), stack(lapply(dates, as.vector)))
!tapply(tmp$values, tmp$row, FUN=is.unsorted)

And finally, the brute force method of comparing each column with the next via Map, which should be even quicker again:

Reduce(`&`, Map(`<`, dates[-length(dates)], dates[-1]))
4
votes

A simple apply statement will do the trick:

dates$Seq <- apply(dates, 1, function(x) all(x == sort(x)))
2
votes
rowSums(Reduce(pmax, dates, accumulate = TRUE) == dates) == NCOL(dates)
#[1]  TRUE  TRUE FALSE

Reduce with pmax identifies the successive maximum date for each row. With accumulate = TRUE we preserve the output of Reduce for each iteration and compare with the original data in dates

Another approach that introduces NA if the dates are not sorted.

!is.na(Reduce(function(x, y) ifelse(x > y | is.na(x), NA, y), dates))
[1]  TRUE  TRUE FALSE