1
votes

I have a multi-year (1985-2010) time series of daily data that I would like to aggregate at 8-day intervals. The problem is that I'm interested in analyzing annual results, so the first interval of each year should begin on January 1.

If I construct an example zoo object:

library(zoo)
indices <- seq.Date(as.Date("1985-01-01"), as.Date("1988-12-31"), by = 'day')
a.zoo <- zoo(rnorm(length(indices)), order.by = indices)

head(a.zoo)
 1985-01-01  1985-01-02  1985-01-03  1985-01-04  1985-01-05  1985-01-06 
 0.47454560 -1.10429098 -1.27926702  0.46199385 -0.12975014  0.03752185 

then I can use rollapply to get part of the way there:

rollapply(a.zoo, 8, by=8, by.column=FALSE, FUN=function(x) mean(x), align = "left")

but there is no distinction between years, so the start date of the first annual interval varies. If I transform the zoo object into a data frame I can use a plyr command to apply the function by year:

library(plyr)
a.df <- data.frame(date = time(a.zoo), 
                    data = a.zoo, 
                    check.names = F, 
                    row.names = NULL)
a.8 <- dlply(a.df, .(format(date, "%Y")), 
            function(x) {split(x$data, ceiling(seq_along(x$data)/8))})
a8.mean <- rapply(a.8, mean, na.rm = T)

head(a8.mean)
    1985.1     1985.2     1985.3     1985.4     1985.5     1985.6 
-0.2744355  0.3103211  0.2057675 -0.1537141  0.6807115 -0.1581474 

but I lose the date information. Does anyone have any suggestions for how to tweak one approach or the other (or can offer a new, more brilliant idea) so that I wind up with time-tagged data at 8-day intervals that begin on January 1st each year? Thanks for any help!

2

2 Answers

1
votes

I used this SO answer for this solution. Basically divide the zoo object by year:

a.yr = tapply(a.zoo, format(index(a.zoo), "%Y"), c)

Then apply rollapply as you were doing, for each year.

rollapply(a.yr$`1985`, 8, by=8, by.column=FALSE, FUN=function(x) mean(x), align = "left")

You can then merge the zoo objects.

1
votes

This might not be the best answer, but you can extract the name for your second approach then assign it to your a8.mean result

a8.name <- (dlply(a.df, .(format(date, "%Y")), function(x) x$date[seq_along(x$date) %% 8 == 1]))
names(a8.mean) <- do.call(c, a8.name)