aggregating time series in R

Question

I have the following OHLC data (by 3-minute intervals)

library(tseries)
library(xts)
library(quantmod)
> str(tickmin)
An ‘xts’ object from 2010-06-30 15:47:00 to 2010-09-08 15:14:00 containing:
  Data: num [1:8776, 1:5] 9215 9220 9205 9195 9195 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:5] "zv.Open" "zv.High" "zv.Low" "zv.Close" ...
  Indexed by objects of class: [POSIXct,POSIXt] TZ: 
  xts Attributes:  
 NULL


>tickmin
2010-09-08 15:02:00        20
2010-09-08 15:04:00        77
2010-09-08 15:08:00        86
2010-09-08 15:11:00         7
2010-09-08 15:14:00        43
> start(tickmin)
[1] "2010-06-30 15:47:00 EDT"
> end(tickmin)
[1] "2010-09-08 15:14:00 EDT"

I am trying to aggregate it using the following:

> by <-timeSequence(from = start(tickmin), to = end(tickmin), format="%Y-%m-%d %H%M", by = "day")
>by
[61] [2010-08-29 19:47:00] [2010-08-30 19:47:00] [2010-08-31 19:47:00]
[64] [2010-09-01 19:47:00] [2010-09-02 19:47:00] [2010-09-03 19:47:00]
[67] [2010-09-04 19:47:00] [2010-09-05 19:47:00] [2010-09-06 19:47:00]
[70] [2010-09-07 19:47:00]

> aggregate(Vo(tickmin),by,sum)
Error: length(time(x)) == length(by[[1]]) is not TRUE

..would appreciate any suggestions on how I can fix the error.

Please indicate what package(s) you're using and the structure of your objects. Or do us even better, post a reproducible example. — Roman Luštrik
if you need specifics of the example, I can do it, otherwise it is a lot of data for a reproducible example. — itcplpl
-1 Your example is entirely unclear. str(tickmin) shows an xts object with 5 columns, yet tickmin only has one column when you print it on the next line. The timeSequence function does not exist in any of the 3 packages you list. — Joshua Ulrich

mathematical.coffee mathematical.coffee · Accepted Answer · 2012-01-17T00:44:18

I'll explain your error and tell you how to fix it, but there's a better way to do what you're doing. So make sure you read my entire answer!

From the error message, the length of your by is not the same length as Vo(tickmin). You have to generate your by to have one value per corresponding value in tickmin, with the day.

As an example here I generate an xts object:

# generate a set of times from 2010-06-30 onwards at 20 minute intervals
tms <- as.POSIXct(seq(0,3600*24*30,by=60*20),origin="2010-06-30")
n   <- length(tms)
# generate volumes for those intervals, random 0 -- 100, turn into xts object
xts.ts <- xts(sample.int(100,n,replace=T),tms)
colnames(xts.ts)<-'Volume'

which yields:

> head(xts.ts)
                    Volume
2010-06-30 00:00:00     97
2010-06-30 00:20:00     78
2010-06-30 00:40:00     38
2010-06-30 01:00:00     86
2010-06-30 01:20:00     79
2010-06-30 01:40:00     55

To access the dates of xts.ts you use index(xts.ts) which gives a whole bunch of strings of the date, e.g. "2010-07-30 00:00:00 EST".

To round these to the nearest day you can use as.Date:

> as.Date(index(xts.ts))
   [1] "2010-06-29" "2010-06-29" "2010-06-29" "2010-06-29" "2010-06-29"
    ....

Solution to your problem

Then to use aggregate you do:

> aggregate(Vo(xts.ts),as.Date(index(xts.ts)),sum)

2010-06-29 1858
2010-06-30 3733
2010-07-01 3906
2010-07-02 3359
2010-07-03 3838
...

Better solution to your problem

The xts package has functions apply.daily, apply.monthly, etc (use ls('package:xts') to see what functions it has -- there may be ones you're interested in).

apply.daily(x,FUN,...) does exactly what you want. See ?apply.daily. To use it you can do:

> apply.daily(xts.ts,sum)

                    Volume
2010-06-30 23:40:00   4005
2010-07-01 23:40:00   4093
2010-07-02 23:40:00   3419
2010-07-03 23:40:00   3737
...

Or if your xts object has other columns like Open, Close etc, you can do apply.daily(xts.ts, function(x) sum(Vo(x))).

Note that the answers are slightly different using apply.daily to the aggregate ... as.Date method. That's because apply.daily goes daily from start(xts.ts) to end(xts.ts) (more or less) whereas aggregate just went by day from midnight to midnight.

Looking at your question, apply.daily seems to match most closely what you want to do (and is provided with xts anyway, so why not use it?)

aggregating time series in R

1 Answers

Solution to your problem

Better solution to your problem