Error trying to produce forecast errors in R

Question

I am trying to modify some code that I have, which works, to instead work with a different function for estimating a model. The original code is the following, and it works with the ARIMA function:

S=round(0.75*length(ts_HHFCE_log))
h=1

error1.h <- c()
for (i in S:(length(ts_HHFCE_log)-h))
{
  mymodel.sub <- arima(ts_HHFCE_log[1:i], order = c(0,1,3),seasonal=c(0,0,0))
  predict.h <- predict(mymodel.sub,n.ahead=h)$pred[h]
  error1.h <- c(error1.h,ts_HHFCE_log[i+h]-predict.h)
}

The intuition is the following: Your time series has length T. You start somewhere at the beginning of your sample, but to give enough observations to regress and obtain parameter coefficients for your alpha and betas. Let's call this t for simplicity. Then based on this, you produce a one-step ahead forecast, so for time period (t+1). Then your forecast error is the difference between the actual value for (t+1) and your forecast value based on regressing on data available until t. Then you iterate, and consider from the start to (t+1), regress, and forecast (t+2). Then you obtain a forecast error for (t+2). Then basically you keep on doing this iterative process until you reach (T-1) and produce a forecast for T. This provides with what is known as a dynamic out of sample forecast error series. You do this for different models and then ascertain using a statistical test which is the more appropriate model to use. It is a way to produce out of sample forecasting using only the data you already have.

I have modified the code to be the following:

S=round(0.75*length(ts.GDP))
h=1

error1.h <- c()
for (i in S:(length(ts.GDP)-h))
{
  mymodel.sub <- lm(ts.GDP[4:i] ~ ts.GDP[3:(i-1)] + ts.GDP[2:(i-2)] + ts.GDP[1:(i-3)])
  predict.h <- predict(mymodel.sub,n.ahead=h)$pred[h]
  error1.h <- c(error1.h,ts.GDP[i+h]-predict.h)
}

I'm trying to do an AR(3) model. The reason I am not using the ARIMA function is because I also then want to compare these forecast errors with an ARDL model, and to my knowledge there is no simple function for the ARDL model (I'd have to use lm(), hence why I want to do the AR(3) model using the lm() function).

The model I wish to compare the AR(3) model is the following:

model_ts.GDP_1 <- lm(ts.GDP[4:123] ~ ts.GDP[3:122] + ts.GDP[2:121] + ts.GDP[1:120] + ts.CCI_AGG[3:122] + ts.CCI_AGG[2:121] + ts.CCI_AGG[1:120])

I am unsure how further to modify the code to get what I am after. Hopefully the intuition bit I explained should be clear in what I am trying to do.

The data for GDP is basically the quarterly growth rate. It is stationary. The other variable in the second model is an index I've constructed using a dynamic PCA and taken first differences so it too is stationary. But in any case, in the second model, the forecast at t is based only on lagged data of each GDP and the index I constructed. Equally, given I am simulating out of sample forecast using data I have, there is no issue with actually properly forecasting. (In time series, this technique is seen as a more robust method to compare models than simply using things such as RMSE, etc.)

Thanks!

The data I am using:

Date    GDP_qoq CCI_A_qoq
31/03/1988  2.956   0.540
30/06/1988  2.126   -0.743
30/09/1988  3.442   0.977
31/12/1988  3.375   -0.677
31/03/1989  2.101   0.535
30/06/1989  1.787   -0.667
30/09/1989  2.791   0.343
31/12/1989  2.233   -0.334
31/03/1990  1.961   0.520
30/06/1990  2.758   -0.763
30/09/1990  1.879   0.438
31/12/1990  0.287   -0.708
31/03/1991  1.796   -0.078
30/06/1991  1.193   -0.735
30/09/1991  0.908   0.896
31/12/1991  1.446   0.163
31/03/1992  0.870   0.361
30/06/1992  0.215   -0.587
30/09/1992  0.262   0.238
31/12/1992  1.646   -1.436
31/03/1993  2.375   0.646
30/06/1993  0.249   -0.218
30/09/1993  1.806   0.676
31/12/1993  1.218   -0.393
31/03/1994  1.501   0.346
30/06/1994  0.879   -0.501
30/09/1994  1.123   0.731
31/12/1994  2.089   0.062
31/03/1995  0.386   0.475
30/06/1995  1.238   -0.243
30/09/1995  1.836   0.263
31/12/1995  1.236   -0.125
31/03/1996  1.926   -0.228
30/06/1996  2.109   -0.013
30/09/1996  1.312   0.196
31/12/1996  0.972   -0.015
31/03/1997  1.028   -0.001
30/06/1997  1.086   -0.016
30/09/1997  2.822   0.156
31/12/1997  -0.818  -0.062
31/03/1998  1.418   0.408
30/06/1998  0.970   -0.548
30/09/1998  0.968   0.466
31/12/1998  2.826   -0.460
31/03/1999  0.599   0.228
30/06/1999  -0.651  -0.361
30/09/1999  1.289   0.579
31/12/1999  1.600   0.196
31/03/2000  2.324   0.535
30/06/2000  1.368   -0.499
30/09/2000  0.825   0.440
31/12/2000  0.378   -0.414
31/03/2001  0.868   0.478
30/06/2001  1.801   -0.521
30/09/2001  0.319   0.068
31/12/2001  0.877   0.045
31/03/2002  1.253   0.061
30/06/2002  1.247   -0.013
30/09/2002  1.513   0.625
31/12/2002  1.756   0.125
31/03/2003  1.443   -0.088
30/06/2003  0.874   -0.138
30/09/2003  1.524   0.122
31/12/2003  1.831   -0.075
31/03/2004  0.780   0.395
30/06/2004  1.665   -0.263
30/09/2004  0.390   0.543
31/12/2004  0.886   -0.348
31/03/2005  1.372   0.500
30/06/2005  2.574   -0.066
30/09/2005  0.961   0.058
31/12/2005  2.378   -0.061
31/03/2006  1.015   0.212
30/06/2006  1.008   -0.218
30/09/2006  1.105   0.593
31/12/2006  0.943   -0.144
31/03/2007  1.566   0.111
30/06/2007  1.003   -0.125
30/09/2007  1.810   0.268
31/12/2007  1.275   -0.592
31/03/2008  1.413   0.017
30/06/2008  -0.491  -0.891
30/09/2008  -0.617  -0.836
31/12/2008  -1.410  -1.092
31/03/2009  -1.593  0.182
30/06/2009  -0.106  -0.922
30/09/2009  0.788   0.351
31/12/2009  0.247   0.414
31/03/2010  1.221   -0.329
30/06/2010  1.561   -0.322
30/09/2010  0.163   0.376
31/12/2010  0.825   -0.104
31/03/2011  2.484   0.063
30/06/2011  -0.574  -0.107
30/09/2011  0.361   -0.006
31/12/2011  0.997   -0.304
31/03/2012  0.760   0.243
30/06/2012  0.143   -0.381
30/09/2012  2.547   0.315
31/12/2012  0.308   -0.046
31/03/2013  0.679   0.221
30/06/2013  0.766   -0.170
30/09/2013  1.843   0.352
31/12/2013  0.756   0.080
31/03/2014  1.380   -0.080
30/06/2014  1.501   0.162
30/09/2014  0.876   0.017
31/12/2014  0.055   -0.251
31/03/2015  0.497   0.442
30/06/2015  1.698   -0.278
30/09/2015  0.066   0.397
31/12/2015  0.470   0.076
31/03/2016  1.581   0.247
30/06/2016  0.859   -0.342
30/09/2016  0.865   -0.011
31/12/2016  1.467   0.049
31/03/2017  1.006   0.087
30/06/2017  0.437   -0.215
30/09/2017  0.527   0.098
31/12/2017  0.900   0.218

Could you provide a reproducible example with a small subset of your data? It is difficult to help without knowing how the data looks like. Also, try to find out which of your commands produces the error; for example by commenting out all but the first line in your for loop and so on. — otwtm
When you want to get predictions from a fitted lm model you don't need to add the line $pred[h] after predict (unlike time series models). Also, for time series models the parameter n.ahead inside predict makes sense but in the case of linear regression it doesn't make sense because you need new observations if you want to get new predictions. It's not clear what your new observations are in your situation because you didn't provide a reproducible example — antonioACR1
@otwtm Thanks both. I have edited my original question, given the original code I have which works perfectly to do what I want, and then also explained the intuition behind what I am doing. I hope this makes sense. I'd attach data, the full R code file, etc. if it helps but I do not know how I'd do that. Regarding the new observations in OLS, it should be clear from the intuition section, but basically I am using my in sample data to simulate out of sample forecasting and iterating from some t to T, to compare the accuracy of the models. Thanks. — eBopBob

antonioACR1 antonioACR1 · Accepted Answer · 2018-08-27T16:12:10

The only thing you need to understand is how to get predictions using lm, it's not necessary to add other details (without reproducible data you're only making it more difficult).

Create dummy data:

set.seed(123)
df<-data.frame(a=runif(10),b=runif(10),c=runif(10))
> print(df)
       a          b         c
1  0.2875775 0.95683335 0.8895393
2  0.7883051 0.45333416 0.6928034
3  0.4089769 0.67757064 0.6405068
4  0.8830174 0.57263340 0.9942698
5  0.9404673 0.10292468 0.6557058
6  0.0455565 0.89982497 0.7085305
7  0.5281055 0.24608773 0.5440660
8  0.8924190 0.04205953 0.5941420
9  0.5514350 0.32792072 0.2891597
10 0.4566147 0.95450365 0.1471136

Fit your model:

model<-lm(c~a+b,data=df)

Create new data:

new_df<-data.frame(a=runif(1),b=runif(1))

> print(new_df)
      a        b
1 0.9630242 0.902299

Get predictions from your new data:

prediction<- predict(model,new_df)   

> print(prediction)
    1 
0.8270997

In your case, the new data new_df will be your lagged data, but you have to make the appropriate changes, OR provide reproducible data as above if you want us to go through the details of your problem.

Hope this helps.

Error trying to produce forecast errors in R

1 Answers