Training data set for auto.arima in R

Question

I have around 10000 time series.

I wanted to use auto.arima function http://www.inside-r.org/packages/cran/forecast/docs/auto.arima

I wanted to test the accuracy of my auto.arima model for the 10000 time series. I am holding off 20% of data points (if you see sample out of 40 I will hold off 8) and then let auto.arima predict. Then I can compare generated 8 values with actual 8 values.
But is there a formal way to test accuracy in ARIMA model? Is my approach correcT?

y=auto.arima(x)
plot(forecast(y,h=8))

Sample time series 1

0.0003748,0.0003929,0.0003653,0.0003557,0.0004463,0.000349,0.0003099,0.0003395,0.0003157,0.0002871,0.0002604,0.0002422,0.0001917,0.0002117,0.0002689

time series 2

0.0003977,0.0003481,0.0002413,0.0002069,0.0002127,0.0002108,0.0002003,0.0002174,0.0002098,0.0002069,0.0001955,0.0001926,0.0002108,0.0002146,0.0002079

I'm not clear on the specific problem. Is the issue that the auto.arima function does not return a valid model for some of your time series? Or that you are struggling with coding a loop to auto-fit each of your 8000 time series? — Whitebeard
yes this was the question. do I need to create a specific model for each time series? I have now understood that I have to hold off some data points from my time series, use auto.arima for each time series and then finally test the accuracy. Do you know if there is a prebuilt function to test accuracy of autoarima? — user2961712
@SamThomas I have edited the question to explain. Please see the edit. Thanks :) — user2961712
If you want to use auto.arima, then yes, you need to fit one model for time series. See the help page, which specifies a univariate time series. ?accuracy for testing cross-validation. Also see the hts package if you want to fit a hierarchical time series. — Whitebeard
can I have one model for the complete 8000 time series? Would I need to use Arima() function for that? also you mean to say that if I want to use auto.arima then I need to fit one model for EACH time series? — user2961712

Mark S Mark S · Accepted Answer · 2015-09-29T22:25:14

It sounds to me like your Q is about the different metrics for comparing forecast accuracy, more than the specific use of auto.arima() and forecast(). If so, then there are a number of metrics that can be used. For an overview, see

https://en.wikipedia.org/wiki/Forecasting#Forecasting_accuracy

Each of them has its proponents and detractors; for example, see this paper:

http://robjhyndman.com/papers/mase.pdf

Independent of what accuracy metric you use, you still need to be able to justify why you are holding back 20% of the data for forecasting.

If, however, you are interested in the different model forms, then you also have some options. For example, as suggested in the comments,

fit the same univariate model (specified a priori) to each time series using arima() (or some equivalent);
fit a (potentially) different univariate model to each time series using auto.arima(); or
fit a multivariate model to all time series.

If it's #3 you're interested in, I'd suggest the MARSS pkg here:

https://cran.r-project.org/web/packages/MARSS/index.html

and user's guide here:

https://cran.r-project.org/web/packages/MARSS/vignettes/UserGuide.pdf

Training data set for auto.arima in R

1 Answers