0
votes

I have a time series training dataset from the period 2010-2016 with the number of observations listed in the table below. I want to perform rolling origin cross-validation in R, where the initial fold uses the observations from 2010 as training and 2011 as testing. The second fold uses daa from 2010 and 2011 as training and 2012 as testing etc. I have tried different functions such as rolling_origin and carets trainControl but sadly it seems only to work with 1 forecast horizon value and 1 skip value. I deeply appreciate any help, especially code example!

2010 2011 2012 2013 2014 2015 2016
614 617 599 677 881 1215 1208
1

1 Answers

0
votes

Let x be the data and y[i] be the year of x[i]. Then calculate u[i] as the indexes of the last data point in the ith unique year and iterate over the indexes of the last points in the training and test sets. In the code below we return the training and test data for each iteration but you can replace the line marked ## with whatever calculation you need.

y <- c(2000, 2000, 2000, 2001, 2001, 2002)
x <- 11:16

u <- unique(findInterval(y, y))  # 3, 5, 6

# input is last index of training and test sets in x
f <- function(itrain, itest) {    
  train <- x[ seq(1, itrain)]
  test <- x[ seq(itrain+1, itest) ]
  list(train = train, test = test)  ##
}
L <- Map(f, itrain = head(u, -1), itest = tail(u, -1))
names(L) <- y[ u[-1] ]

str(L)

giving this named list where the names are the years of the test set:

List of 2
 $ 2001:List of 2
  ..$ train: int [1:3] 11 12 13
  ..$ test : int [1:2] 14 15
 $ 2002:List of 2
  ..$ train: int [1:5] 11 12 13 14 15
  ..$ test : int 16