enter code here
I am working on a project to analyse and forecast time series for sales and revenue of a client. There are various models that i want to test for accuracy purposes - namely Holt Linear Method, Holt Winter Method, ARIMA, Seasonal ARIMA, and ARIMAX (as I also want to consider categorical variables in the data).
The data is in daily form, and hence i have chosen frequency to be 7.
startW <- as.numeric(strftime(head(revenue$date, 1), format = "%W"))
startD <- as.numeric(strftime(head(revenue$date, 1) + 1, format =" %w"))
revenue <- ts(revenue$amount, start = c(startW, startD), frequency = 7)
I then split it into train and test, keeping last month as hold-out set.
I have used auto.arima()
function for the ARIMA model and it is giving ARIMA(0,0,0)(2,1,0)[7]. What does that imply? The residuals plot looks like this
Following this i added holidays as an exogenous variable
encoded_regressors <- sparse.model.matrix(amount~holiday, data = train_set)
encoded_regressors <- (encoded_regressors[,-1])
model2 <- auto.arima(revenue.train, xreg = encoded_regressors)
The model i get now is ARIMA(0,0,1)(2,1,0)[7] and here is the residual plot.
For both the cases if i see the difference in predicted and observed value the percentage difference ranges from 3%-50% on average. How can i improve my model and understand the output of the ARIMA model?
Thanks!