1
votes

How do I subset a time series from the start up to the first occurrence of a variable meeting a condition?

tribble(
  ~t, ~x, ~y,
  as.POSIXct(strptime("2011-03-27 01:30:00", "%Y-%m-%d %H:%M:%S")), -1, 1,
  as.POSIXct(strptime("2011-03-27 01:30:01", "%Y-%m-%d %H:%M:%S")), -5, 2,
  as.POSIXct(strptime("2011-03-27 03:45:00", "%Y-%m-%d %H:%M:%S")), -3, 5,
  as.POSIXct(strptime("2011-03-27 04:20:00", "%Y-%m-%d %H:%M:%S")), -8, 3,
  as.POSIXct(strptime("2011-03-27 04:25:00", "%Y-%m-%d %H:%M:%S")), -2, 8
)

For example all rows from start to first occurrence of y > 4 (expecting the first three rows of the sample data).

 

h3rm4ns Solution explained

simpler case of not including the first row to match the condition would be:

 %>% filter(cumsum(y > 4) == 0)

y > 4 will be false which is equal to 0 in R, so the cumsum == 0 will return TRUE (and thus filter) for all rows up to the first one that matches y > 4 and therefore adds a 1 to the sum.

To have it include the matching row, we additionally lag(y, default = 0).

1

1 Answers

2
votes

You can do the following:

df %>% filter(!cumsum(lag(y, default = 0) > 4))

The result:

# A tibble: 3 x 3
                    t     x     y
               <dttm> <dbl> <dbl>
1 2011-03-27 01:30:00    -1     1
2 2011-03-27 01:30:01    -5     2
3 2011-03-27 03:45:00    -3     5