4
votes

I have a financial time series in R (currently an xts object, but I'm also looking into tibble right now).

How do I find the probability of 2 adjacent rows matching a condition?

For example I want to know the probability of 2 consecutive days having a higher than mean/median value. I know I can lag the previous days value into the next row which would allow me to get this statistic, but that seems very cumbersome and inflexible.

Is there a better way to get this done?

xts sample data:

foo <- xts(x = c(1,1,5,1,5,5,1), seq(as.Date("2016-01-01"), length = 7, by = "days"))

What's the probability of 2 consecutive days having a higher than median value?

2
Please provide a minimal Reproducible example.Heikki
I added minimal xts sample data.TommyF

2 Answers

1
votes

You can create a new column that calls out which are higher than the median, and then take only those that are consecutive and higher

> foo <- as_tibble(data.table(x = c(1,1,5,1,5,5,1), seq(as.Date("2016-01-01"), length = 7, by = "days")))

Step 1

Create column to find those that are higher than median

> foo$higher_than_median <- foo$x > median(foo$x)

Step 2

Compare that column using diff,

Take it only when both are consecutively higher or lower..c(0, diff(foo$higher_than_median) == 0

Then add the condition that they must both be higher foo$higher_than_median == TRUE

Full Expression:

foo$both_higher <- c(0, diff(foo$higher_than_median)) == 0 & $higher_than_median == TRUE

Step 3

To find probability take the mean of foo$both_higher

mean(foo$both_higher)
[1] 0.1428571
1
votes

Here is a pure xts solution.

How do you define the median? There are several ways.

In an online time series use, like computing a moving average, you can compute the median over a fixed lookback window (shown below), or from the origin up to now (an anchored window calculation). You won't know future values in the median computation beyond the current time step (Avoid look ahead bias).:

library(xts)
library(TTR)

x <- rep(c(1,1,5,1,5,5,1, 5, 5, 5), 10)
y <- xts(x = x, seq(as.Date("2016-01-01"), length = length(x), by = "days"), dimnames = list(NULL, "x"))

# Avoid look ahead bias in an online time series application by computing the median over a rolling fixed time window:
nMedLookback <- 5
y$med <- runPercentRank(y[, "x"], n = nMedLookback)
y$isAboveMed <- y$med > 0.5

nSum <- 2
y$runSum2 <- runSum(y$isAboveMed, n = nSum)

z <- na.omit(y)
prob <- sum(z[,"runSum2"] >= nSum) / NROW(z)

The case where your median is over the entire data set is obviously a much easier modification of this.