I'm trying to follow the time series tutorial here (using my own dataset):
https://www.analyticsvidhya.com/blog/2018/02/time-series-forecasting-methods/
Surprisingly, I am able to satisfactorily successfully reach Part 7: ARIMA. In this section, I am stumbling quite a bit. All the values in the Prediction column for it are NaN.
In the terminal, I see
a date index has been provided but it has no associated frequency information and so will be ignored when forecasting
My test data set has a few date gaps for when no transactions occurred, so I fill it with
test=test.set_index('DATE').asfreq('D', fill_value=0)
. I also do the same thing with my ARIMA dataset, so the index matches the test set.
The rest of the relevant code is as follows:
train=df[0:180]
test=df[180:]
SARIMA=test.copy()
fit=sm.tsa.statespace.SARIMAX(train['COUNT'], order=(1,1,1), seasonal_order=(0,0,0,5)).fit()
SARIMA['SARIMA']=fit3.predict(start=0,
end=93,dynamic=True)
print(SARIMA)
print(test)
In the print output, the index for the test set and ARIMA set are the same. The ARIMA contains a column SARIMA
which contains the predictions, except they are all NaN
. What am I missing?
test
DATE COUNT
2018-06-21 1
2018-06-22 3
..
2018-11-21 3
2018-11-22 4
SARIMA
DATE COUNT SARIMA
2018-06-21 1 NaN
2018-06-22 3 NaN
..
2018-11-21 3 NaN
2018-11-22 4 NaN
edit:
for some reason statsmodels
simply cannot detect the index frequency. I've tried
SARIMA=SARIMA.set_index('DATE').asfreq('D',fill_value=0)
SARIMA.index=pd.to_datetime(SARIMA.index)
SARIM.index=pd.DatetimeIndex(SARIMA.index.values, freq='D')
But the warning always appears
edit: I straight up tried to make a new dataset in Excel:
DATE COUNT
2018/01/01 1
2018/01/02 2
..
2018/01/10 3
2018/01/11 4
created the model with the same lines above, except setting enforce_stationarity
and enforce invertibility
to False
. All the predictions are still NaN
edit3: using the fake excel dataset, I've come 1 step closer. Passing start='2018-01-01'
and end='2018-01-21'
yielded predictions of all 0s, which is better than NaN
. Can anyone make sense of these results?
edit4: setting dynamic=False
returned reasonable predictions. Clearly I'm no statistican
Nan
values – jeevsdf.isnull().values.any()
yieldsFalse
so I don't think there are anyNaNs
– machump