1
votes

How can I calculate a two-month rolling mean, if the two-months windows are not of equal length? Preferably in datatable. Sample code:

set.seed(24)
test <- data.table(x = rnorm(762),time=seq(as.Date("1988/03/15"), as.Date("1990/04/15"), "day"))

So here, the first mean would be from 1988/03/15 until 1988/04/30, the second one from 1988/04/01 until 1988/05/31, and so on. The data availability in each month can be of different length, by intention.

1
Do you mean for example, first value will be the mean from 15/03 until 15/05; then 16/04 until 16/05 and so on?talat
Thanks for the question. No, I meant the first one from 15/03 until 30/04. For the first month, there are just not as many days available. I edited the question.User878239
What is the second value?talat
In terms of speed I can recommend RcppRoll. Worked wonders for bigger data5th

1 Answers

2
votes

Add a yearmon column and then summarize the sum and length of x by yearmon.
Finally divide a rolling sum over x by a rolling sum over the length N.

library(data.table)
library(zoo)

Means <- test[, yearmon := as.yearmon(time)][
   , list(x = sum(x), N = .N), by = "yearmon"][
   , list(yearmon, mean = rollsumr(x, 2, fill = NA) / rollsumr(N, 2, fill = NA))]

Alternately convert test to a zoo object, sum x and the length by yearmon, calculate the rolling sum of both x and n and divide giving a zoo object with the year/months and means. See ?fortify.zoo if you would like to convert that to a data frame.

z <- cbind(x = read.zoo(test, index = "time"), n = 1)
zym <- aggregate(z, as.yearmon, sum)
transform(rollsumr(zym, 2), mean = x / n)

Note

Input used is:

set.seed(24)
test <- data.table(x = rnorm(762), time=seq(as.Date("1988/03/15"), 
                          as.Date("1990/04/15"), "day"))