3
votes

I want to tally the number of times that there are consecutive observations matching a condition. For instance. In foo below I would like to tally the number of days in the month of March where consecutive values of y are smaller than one standard deviation from the mean value of y for that month. My data are laid out like foo:

library(lubridate)
foo <- data.frame(x=seq.Date(as.Date("1981/1/1"), 
                             as.Date("2000/12/31"), "day"))
foo$y <- arima.sim(n = nrow(foo), list(ar = c(0.8)))

I've figured out how to tally the number of days in March for each year where y is more than one standard deviation below the mean:

bar <- foo %>% filter(month(x) == 3 & y < mean(y)-sd(y)) %>% 
  group_by(year(x)) %>% tally()

I would like this count to be only when the the days matching the condition are consecutive. E.g., if the mean temperature for March is 0 and and the sd is 1 and March 5, 6 and 7 in the year 1990 are all below -1 the tally would be 3 for the year 1990. If March 21 was also < -1 but March 20 and 22 are not < -1, the tally would still be 3 because March 21 doesn't have neighbors that are also < -1.

I imagine rle comes into play but I don't understand how.

Any advice appreciated.

1

1 Answers

2
votes

So this should work.

foo %>%
  separate(x, sep = "-", into = c("year", "month", "day")) %>%
  filter(month == "03") %>%
  group_by(year) %>%
  mutate(z =  y < mean(y)-sd(y),
         g = {r <- rle(z)
              r$values[r$lengths < 2] <- FALSE
              inverse.rle(r)}) %>%
  tally(g)

# A tibble: 20 x 2
   year      n
   <chr> <int>
 1 1981      2
 2 1982      6
 3 1983      4
 4 1984      4
 5 1985      3
 6 1986      5
 7 1987      3
 8 1988      7
 9 1989      5
10 1990      4
11 1991      7
12 1992      4
13 1993      6
14 1994      5
15 1995      3
16 1996      5
17 1997      5
18 1998      4
19 1999      6
20 2000      6

I have left z and g so you can ceck the result.

UPDATE: rle takes a sequence and creates an object with two elements from it: 1. lengths - the number how often an element is repeated consecutively in the sequence. 2. values - the according value.

Take this example:

seq <- c("a", "a", "a", "b", "b", "c")

rle_obj <- rle(seq)

rle_obj
Run Length Encoding
  lengths: int [1:3] 3 2 1
  values : chr [1:3] "a" "b" "c"

Now you can manipulate the sequence. For example turn "b" into a sequence of 4 instead of 2:

rle_obj$lengths[rle_obj$values == "b"] <- 4

inverse.rle(rle_obj)
[1] "a" "a" "a" "b" "b" "b" "b" "c"

Hope that gave you some insight.