2
votes

enter code hereI am working on a project to analyse and forecast time series for sales and revenue of a client. There are various models that i want to test for accuracy purposes - namely Holt Linear Method, Holt Winter Method, ARIMA, Seasonal ARIMA, and ARIMAX (as I also want to consider categorical variables in the data). The data is in daily form, and hence i have chosen frequency to be 7.

startW <- as.numeric(strftime(head(revenue$date, 1), format = "%W"))
startD <- as.numeric(strftime(head(revenue$date, 1) + 1, format =" %w")) 
revenue <- ts(revenue$amount, start = c(startW, startD), frequency = 7)

I then split it into train and test, keeping last month as hold-out set.

I have used auto.arima() function for the ARIMA model and it is giving ARIMA(0,0,0)(2,1,0)[7]. What does that imply? The residuals plot looks like this Residual Plot 1

Following this i added holidays as an exogenous variable

encoded_regressors <- sparse.model.matrix(amount~holiday, data = train_set)
encoded_regressors <- (encoded_regressors[,-1])
model2 <- auto.arima(revenue.train, xreg = encoded_regressors)

The model i get now is ARIMA(0,0,1)(2,1,0)[7] and here is the residual plotResidual Plot 2.

For both the cases if i see the difference in predicted and observed value the percentage difference ranges from 3%-50% on average. How can i improve my model and understand the output of the ARIMA model?

Thanks!

2

2 Answers

0
votes

You seem to be using auto.arima() from the forecast package. You can find a lot of good information about using this package and time series forecasting in R here. For the output that you have given, the 3 values in the first parenthesis refer to the order of p, d, and q in the ARIMA model. p is the autoregressive term, d is the order of differencing, and q is the moving average term. The 3 values in the second parenthis refer to the seasonal components P, D, and Q, with each of these referring to the autoregressive, differencing, and moving average terms respectively. The number 7 in the brackets refers to the frequency that you chose.

In general, to find the best ARIMA model, you would look at the Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC), and try to minimize these. Again, look at the link for more details.

0
votes

The ACF and PACF plots of the time series are as under ACF Plot PACF Plot

If my understanding is correct ACF suggests q = 7 and PACF suggests p = 7?