I am running rolling regressions in R, using with the data stored in a data.table
.
I have a working version, however it feels like a hack -- I am really using what i know from the zoo
package, and none of the magic in data.table
... thus, it feels slower than it ought to be.
Incorporating Joshua's suggestion - below - there is a speedup of ~12x by using lm.fit
rather than lm
.
(revised) Example code:
require(zoo)
require(data.table)
require(rbenchmark)
set.seed(1)
tt <- seq(as.Date("2011-01-01"), as.Date("2012-01-01"), by="day")
px <- rnorm(366, 95, 1)
DT <- data.table(period=tt, pvec=px)
dtt <- DT[,tnum:=as.numeric(period)][, list(pvec, tnum)]
dtx <- as.matrix(DT[,tnum:=as.numeric(period)][, tnum2:= tnum^2][, int:=1][, list(pvec, int, tnum, tnum2)])
rollreg <- function(dd) coef(lm(pvec ~ tnum + I(tnum^2), data=as.data.frame(dd)))
rollreg.fit <- function(dd) coef(lm.fit(y=dd[,1], x=dd[,-1]))
rr <- function(dd) rollapplyr(dd, width=20, FUN = rollreg, by.column=FALSE)
rr.fit <- function(dd) rollapplyr(dd, width=20, FUN = rollreg.fit, by.column=FALSE)
bmk <- benchmark(rr(dtt), rr.fit(dtx),
columns = c('test', 'elapsed', 'relative'),
replications = 10,
order = 'elapsed'
)
test elapsed relative
2 rr.fit(dtx) 0.48 1.0000
1 rr(dtt) 5.85 12.1875
Trying to apply the knowledge displayed here and here, I cooked up the following simple rolling regression function that I think uses some of the speed of data.table operations.
Note that the problem is a little different (and more realistic): take a vector, add lags, and regress on itself. This class of AR-type problems is pretty broad.
I am sharing it here as it may be of use, and i'm sure that it can be improved (i'll update as I improve):
require(data.table)
set.seed(1)
x <- rnorm(1000)
DT <- data.table(x)
DTin <- data.table(x)
lagDT <- function(DTin, varname, l=5)
{
i = 0
while ( i < l){
expr <- parse(text =
paste0(varname, '_L', (i+1),
':= c(rep(NA, (1+i)),', varname, '[-((length(', varname, ') - i):length(', varname, '))])'
)
)
DTin[, eval(expr)]
i <- i + 1
}
return(DTin)
}
rollRegDT <- function(DTin, varname, k=20, l=5)
{
adj <- k + l - 1
.x <- 1:(nrow(DTin)-adj)
DTin[, int:=1]
dtReg <- function(dd) coef(lm.fit(y=dd[-c(1:l),1], x=dd[-c(1:l),-1]))
eleNum <- nrow(DTin)*(l+1)
outMatx <- matrix(rep(NA, eleNum), ncol = (l+1))
colnames(outMatx) <- c('intercept', 'L1', 'L2', 'L3', 'L4', 'L5')
for (i in .x){
dt_m <- as.matrix(lagDT(DTin[i:(i+adj), ], varname, l))
outMatx[(i+(adj)),] <- dtReg(dt_m)
}
return(outMatx)
}
rollCoef <- rollRegDT(DT, varname='x')
lm.fit
directly and avoid the overhead of thelm
function. – Joshua Ulrich