I am trying to modify some code that I have, which works, to instead work with a different function for estimating a model. The original code is the following, and it works with the ARIMA function:
S=round(0.75*length(ts_HHFCE_log))
h=1
error1.h <- c()
for (i in S:(length(ts_HHFCE_log)-h))
{
mymodel.sub <- arima(ts_HHFCE_log[1:i], order = c(0,1,3),seasonal=c(0,0,0))
predict.h <- predict(mymodel.sub,n.ahead=h)$pred[h]
error1.h <- c(error1.h,ts_HHFCE_log[i+h]-predict.h)
}
The intuition is the following: Your time series has length T. You start somewhere at the beginning of your sample, but to give enough observations to regress and obtain parameter coefficients for your alpha and betas. Let's call this t for simplicity. Then based on this, you produce a one-step ahead forecast, so for time period (t+1). Then your forecast error is the difference between the actual value for (t+1) and your forecast value based on regressing on data available until t. Then you iterate, and consider from the start to (t+1), regress, and forecast (t+2). Then you obtain a forecast error for (t+2). Then basically you keep on doing this iterative process until you reach (T-1) and produce a forecast for T. This provides with what is known as a dynamic out of sample forecast error series. You do this for different models and then ascertain using a statistical test which is the more appropriate model to use. It is a way to produce out of sample forecasting using only the data you already have.
I have modified the code to be the following:
S=round(0.75*length(ts.GDP))
h=1
error1.h <- c()
for (i in S:(length(ts.GDP)-h))
{
mymodel.sub <- lm(ts.GDP[4:i] ~ ts.GDP[3:(i-1)] + ts.GDP[2:(i-2)] + ts.GDP[1:(i-3)])
predict.h <- predict(mymodel.sub,n.ahead=h)$pred[h]
error1.h <- c(error1.h,ts.GDP[i+h]-predict.h)
}
I'm trying to do an AR(3) model. The reason I am not using the ARIMA function is because I also then want to compare these forecast errors with an ARDL model, and to my knowledge there is no simple function for the ARDL model (I'd have to use lm()
, hence why I want to do the AR(3) model using the lm()
function).
The model I wish to compare the AR(3) model is the following:
model_ts.GDP_1 <- lm(ts.GDP[4:123] ~ ts.GDP[3:122] + ts.GDP[2:121] + ts.GDP[1:120] + ts.CCI_AGG[3:122] + ts.CCI_AGG[2:121] + ts.CCI_AGG[1:120])
I am unsure how further to modify the code to get what I am after. Hopefully the intuition bit I explained should be clear in what I am trying to do.
The data for GDP is basically the quarterly growth rate. It is stationary. The other variable in the second model is an index I've constructed using a dynamic PCA and taken first differences so it too is stationary. But in any case, in the second model, the forecast at t is based only on lagged data of each GDP and the index I constructed. Equally, given I am simulating out of sample forecast using data I have, there is no issue with actually properly forecasting. (In time series, this technique is seen as a more robust method to compare models than simply using things such as RMSE, etc.)
Thanks!
The data I am using:
Date GDP_qoq CCI_A_qoq
31/03/1988 2.956 0.540
30/06/1988 2.126 -0.743
30/09/1988 3.442 0.977
31/12/1988 3.375 -0.677
31/03/1989 2.101 0.535
30/06/1989 1.787 -0.667
30/09/1989 2.791 0.343
31/12/1989 2.233 -0.334
31/03/1990 1.961 0.520
30/06/1990 2.758 -0.763
30/09/1990 1.879 0.438
31/12/1990 0.287 -0.708
31/03/1991 1.796 -0.078
30/06/1991 1.193 -0.735
30/09/1991 0.908 0.896
31/12/1991 1.446 0.163
31/03/1992 0.870 0.361
30/06/1992 0.215 -0.587
30/09/1992 0.262 0.238
31/12/1992 1.646 -1.436
31/03/1993 2.375 0.646
30/06/1993 0.249 -0.218
30/09/1993 1.806 0.676
31/12/1993 1.218 -0.393
31/03/1994 1.501 0.346
30/06/1994 0.879 -0.501
30/09/1994 1.123 0.731
31/12/1994 2.089 0.062
31/03/1995 0.386 0.475
30/06/1995 1.238 -0.243
30/09/1995 1.836 0.263
31/12/1995 1.236 -0.125
31/03/1996 1.926 -0.228
30/06/1996 2.109 -0.013
30/09/1996 1.312 0.196
31/12/1996 0.972 -0.015
31/03/1997 1.028 -0.001
30/06/1997 1.086 -0.016
30/09/1997 2.822 0.156
31/12/1997 -0.818 -0.062
31/03/1998 1.418 0.408
30/06/1998 0.970 -0.548
30/09/1998 0.968 0.466
31/12/1998 2.826 -0.460
31/03/1999 0.599 0.228
30/06/1999 -0.651 -0.361
30/09/1999 1.289 0.579
31/12/1999 1.600 0.196
31/03/2000 2.324 0.535
30/06/2000 1.368 -0.499
30/09/2000 0.825 0.440
31/12/2000 0.378 -0.414
31/03/2001 0.868 0.478
30/06/2001 1.801 -0.521
30/09/2001 0.319 0.068
31/12/2001 0.877 0.045
31/03/2002 1.253 0.061
30/06/2002 1.247 -0.013
30/09/2002 1.513 0.625
31/12/2002 1.756 0.125
31/03/2003 1.443 -0.088
30/06/2003 0.874 -0.138
30/09/2003 1.524 0.122
31/12/2003 1.831 -0.075
31/03/2004 0.780 0.395
30/06/2004 1.665 -0.263
30/09/2004 0.390 0.543
31/12/2004 0.886 -0.348
31/03/2005 1.372 0.500
30/06/2005 2.574 -0.066
30/09/2005 0.961 0.058
31/12/2005 2.378 -0.061
31/03/2006 1.015 0.212
30/06/2006 1.008 -0.218
30/09/2006 1.105 0.593
31/12/2006 0.943 -0.144
31/03/2007 1.566 0.111
30/06/2007 1.003 -0.125
30/09/2007 1.810 0.268
31/12/2007 1.275 -0.592
31/03/2008 1.413 0.017
30/06/2008 -0.491 -0.891
30/09/2008 -0.617 -0.836
31/12/2008 -1.410 -1.092
31/03/2009 -1.593 0.182
30/06/2009 -0.106 -0.922
30/09/2009 0.788 0.351
31/12/2009 0.247 0.414
31/03/2010 1.221 -0.329
30/06/2010 1.561 -0.322
30/09/2010 0.163 0.376
31/12/2010 0.825 -0.104
31/03/2011 2.484 0.063
30/06/2011 -0.574 -0.107
30/09/2011 0.361 -0.006
31/12/2011 0.997 -0.304
31/03/2012 0.760 0.243
30/06/2012 0.143 -0.381
30/09/2012 2.547 0.315
31/12/2012 0.308 -0.046
31/03/2013 0.679 0.221
30/06/2013 0.766 -0.170
30/09/2013 1.843 0.352
31/12/2013 0.756 0.080
31/03/2014 1.380 -0.080
30/06/2014 1.501 0.162
30/09/2014 0.876 0.017
31/12/2014 0.055 -0.251
31/03/2015 0.497 0.442
30/06/2015 1.698 -0.278
30/09/2015 0.066 0.397
31/12/2015 0.470 0.076
31/03/2016 1.581 0.247
30/06/2016 0.859 -0.342
30/09/2016 0.865 -0.011
31/12/2016 1.467 0.049
31/03/2017 1.006 0.087
30/06/2017 0.437 -0.215
30/09/2017 0.527 0.098
31/12/2017 0.900 0.218
lm
model you don't need to add the line$pred[h]
afterpredict
(unlike time series models). Also, for time series models the parametern.ahead
insidepredict
makes sense but in the case of linear regression it doesn't make sense because you need new observations if you want to get new predictions. It's not clear what your new observations are in your situation because you didn't provide a reproducible example – antonioACR1