Generic time-series backtesing/cross-validation with R

Question

I want to make some time-series evaluation in R. The process is usually to define a time lag and the evaluation frequency/periods, and for each evaluation period, train a model with the defined time lag and compute metrics for that period.

For example, we have:

Evaluation period size and interval n
Evaluation start at b
Time lag l

We train a model with points 1:b-l, evaluate it on b:b+n. After that we train a model with points 1:b+n-l and evaluate it on b+n:b+2n and etc, for k periods. It could vary a bit but that's the general spirit. So this is basically a sliding window for the evaluation data, but an increasing window for training data.

This is illustrated in the answer to this question (the expanding window solution).

How could I do this, preferably without loops and using the tidyverse and/or packages specific for time-series analysis?

gsmafra gsmafra · Accepted Answer · 2018-04-06T15:24:43

So this is how I'm doing at the moment, but I'm really not satisfied with it. Too much custom code and not very modular.

time_series_cv <- function(dates_lim, df) {

  eval_data <-
    df %>%
    filter(
      date >= dates_lim[['date_beg']],
      date < dates_lim[['date_end']]
    )

  eval_data$prediction <-
    predict(
      lm(
        log(y) ~ .,
        df %>% filter(date < dates_lim[['date_beg']]) %>% select(-c(date))
      ),
      eval_data
    )

  eval_data %>%
    select(date, y, prediction)
}

predictions <-
  lapply(dates, time_series_cv, df = df) %>%
  bind_rows()

dates is a list of named lists with the start and end of the evaluation period. Lag is 1 sample here.

Generic time-series backtesing/cross-validation with R

1 Answers