Rolling Origin Cross-validation with different horizons?

Question

I have a time series training dataset from the period 2010-2016 with the number of observations listed in the table below. I want to perform rolling origin cross-validation in R, where the initial fold uses the observations from 2010 as training and 2011 as testing. The second fold uses daa from 2010 and 2011 as training and 2012 as testing etc. I have tried different functions such as rolling_origin and carets trainControl but sadly it seems only to work with 1 forecast horizon value and 1 skip value. I deeply appreciate any help, especially code example!

2010	2011	2012	2013	2014	2015	2016
614	617	599	677	881	1215	1208

G. Grothendieck G. Grothendieck · Accepted Answer · 2021-04-24T15:14:32

Let x be the data and y[i] be the year of x[i]. Then calculate u[i] as the indexes of the last data point in the ith unique year and iterate over the indexes of the last points in the training and test sets. In the code below we return the training and test data for each iteration but you can replace the line marked ## with whatever calculation you need.

y <- c(2000, 2000, 2000, 2001, 2001, 2002)
x <- 11:16

u <- unique(findInterval(y, y))  # 3, 5, 6

# input is last index of training and test sets in x
f <- function(itrain, itest) {    
  train <- x[ seq(1, itrain)]
  test <- x[ seq(itrain+1, itest) ]
  list(train = train, test = test)  ##
}
L <- Map(f, itrain = head(u, -1), itest = tail(u, -1))
names(L) <- y[ u[-1] ]

str(L)

giving this named list where the names are the years of the test set:

List of 2
 $ 2001:List of 2
  ..$ train: int [1:3] 11 12 13
  ..$ test : int [1:2] 14 15
 $ 2002:List of 2
  ..$ train: int [1:5] 11 12 13 14 15
  ..$ test : int 16

Rolling Origin Cross-validation with different horizons?

1 Answers