Find adjacent rows that match condition

Question

I have a financial time series in R (currently an xts object, but I'm also looking into tibble right now).

How do I find the probability of 2 adjacent rows matching a condition?

For example I want to know the probability of 2 consecutive days having a higher than mean/median value. I know I can lag the previous days value into the next row which would allow me to get this statistic, but that seems very cumbersome and inflexible.

Is there a better way to get this done?

xts sample data:

foo <- xts(x = c(1,1,5,1,5,5,1), seq(as.Date("2016-01-01"), length = 7, by = "days"))

What's the probability of 2 consecutive days having a higher than median value?

Matt W. Matt W. · Accepted Answer · 2017-11-23T08:03:52

You can create a new column that calls out which are higher than the median, and then take only those that are consecutive and higher

> foo <- as_tibble(data.table(x = c(1,1,5,1,5,5,1), seq(as.Date("2016-01-01"), length = 7, by = "days")))

Step 1

Create column to find those that are higher than median

> foo$higher_than_median <- foo$x > median(foo$x)

Step 2

Compare that column using diff,

Take it only when both are consecutively higher or lower..c(0, diff(foo$higher_than_median) == 0

Then add the condition that they must both be higher foo$higher_than_median == TRUE

Full Expression:

foo$both_higher <- c(0, diff(foo$higher_than_median)) == 0 & $higher_than_median == TRUE

Step 3

To find probability take the mean of foo$both_higher

mean(foo$both_higher)
[1] 0.1428571

Find adjacent rows that match condition

2 Answers