I am trying to extract interesting statistics for an irregular time series data set, but coming up short on finding the right tools for the job. The tools for manipulating regularly sampled time series or index-based series of any time are pretty easily found, though I'm not having much luck with the problems I'm trying to solve.
First, a reproducible data set:
library(zoo)
set.seed(0)
nSamples <- 5000
vecDT <- rexp(nSamples, 3)
vecTimes <- cumsum(c(0,vecDT))
vecDrift <- c(0, rnorm(nSamples, mean = 1/nSamples, sd = 0.01))
vecVals <- cumsum(vecDrift)
vecZ <- zoo(vecVals, order.by = vecTimes)
rm(vecDT, vecDrift)
Assume the times are in seconds. There are almost 1700 seconds (just shy of 30 minutes) in the vecZ
series, and 5001 entries during that time. (NB: I'd try using xts
, but xts
seems to need date information, and I'd rather not use a particular date when it's not relevant.)
My goals are the following:
Identify the indices of the values 3 minutes before and 3 minutes after each point. As the times are continuous, I doubt that any two points are precisely 3 minutes apart. What I'd like to find are the points that are at most 3 minutes prior, and at least 3 minutes after, the given point, i.e. something like the following (in pseudocode):
backIX(t, vecZ, tDelta) = min{ix in length(vecZ) : t - time(ix) < tDelta}
forwardIX(t, vecZ, tDelta) = min{ix in length(vecZ) : time(ix) - t > tDelta}
So, for 3 minutes,
tDelta = 180
. Ift=2500
, then the result forforwardIX()
would be 3012 (i.e. time(vecZ)[2500] is 860.1462, and time(vecZ)[3012] is 1040.403, or just over 180 seconds later), and the output ofbackwardIX()
would be 2020 (corresponding to time 680.7162 seconds).Ideally, I would like to use a function that does not require
t
, as that is going to requirelength(vecZ)
calls to the function, which ignores the fact that sliding windows of time can be calculated more efficiently.Apply a function to all values in a rolling window of time. I've seen
rollapply
, which takes a fixed window size (i.e. fixed number of indices, but not a fixed window of time). I can solve this the naive way, with a loop (orforeach
;-)) that is calculated per indext
, but I wondered if there are some simple functions already implemented, e.g. a function to calculate the mean of all values in a given time frame. Since this can be done efficiently via simple summary statistics that slide over a window, it should be computationally cheaper than a function that accesses all of the data multiple times to calculate each statistic. Some fairly natural functions: mean, min, max, and median.Even if the window isn't varying by time, the ability to vary the window size would be adequate, and I can find that window size using the result of the question above. However, that still seems to require excess calculations, so being able to specify time-based intervals seems more efficient.
Are there packages in R that facilitate such manipulations of data in time-windows, or am I out of luck and I should write my own functions?
Note 1: This question seeks to do something similar, except over disjoint intervals, rather than rolling windows of time, e.g. I could adapt this to do my analysis on every successive 3 minute block, but I don't see a way to adapt this for rolling 3 minute intervals.
Note 2: I've found that switching from a zoo
object to a numeric vector (for the times) has significantly sped up the issue of range-finding / window endpoint identification for the first goal. That's still a naive algorithm, but it's worth mentioning that working with zoo
objects may not be optimal for the naive approach.
xts
is probably the way to go. See?endpoints
,?to.period
,?period.apply
and?split.xts
. Coerce your object to xts like this:x <- .xts(vecVals, vecTimes)
– GSeexts
do that. – Iteratorna.locf
to get your data to be strictly regular. Then userollapply
– GSeerollapply
supportswidth
as a list - I just need to figure out how to get that list, I suppose. – Iterator