How to implement cross validation (on rolling forecasting origin) using ARIMA?

Question

Suppose that I have a time-series dataset using 90% as training set and 10% as a random validation set. How do I evaluate the accuracy of an ARIMA model?

Do I have to fit the ARIMA model with auto.arima using the 100% of the full dataset and iteratively refit it to the training set using forecast::Arima to predict the validation set?

OR

Do I have to iteratively fit the ARIMA model with auto.arima using the training set and predict the validation set, and thus different model and no refitting each time?

I always thought that it was the first one, however, my model is doing weird things when doing so using Fourier terms to incorporate multiple seasonality.

Would be really appreciated if someone could help me out.

devinbrowndev devinbrowndev · Accepted Answer · 2019-11-01T14:39:10

I highly recommend reading the "Evaluating forecast accuracy" (Section 3.4) of Rob Hyndman's Forecasting: Principles and Practice on the topic of cross-validating time series models.

https://otexts.com/fpp2/accuracy.html

Your cross-validation technique will most likely depend on what you're trying to forecast. There are many different techniques of cross-validating time-series models.

Example #1 - Let's say I have monthly sales from 2014-2018 and I want to build a model to predict monthly sales for FY 2019. I would train my ARIMA model on 2014-2017 and predict 12 months, then compare the results of my predictions compared to the actual monthly sales of 2018 that I have as my test set using a technique such as mean absolute percentage error (MAPE, also discussed in Hyndman's book). That being said, your prediction intervals will get increasingly large as you predict further out from your last current data point.

Example #2 - Same forecasting problem of monthly sales. I could also train the model on Jan 2014 - Dec 2017, then predict this time only 1 month in advance. Then train the model on Jan 2014 - Jan 2018, and predict for Feb 2018, then train Jan 2014 - Feb 2018, predict for Mar 2018, and so forth. The image below is a good depiction this methodology.

There are other ways to cross-validate discussed in the book which, again, I recommend reading. R has a ton of awesome time-series specific cross-validation functionality such as the tsCV() function.

Hope this helps. Good luck!

How to implement cross validation (on rolling forecasting origin) using ARIMA?

1 Answers