I have a time series dataset. I am using python, pandas and statsmodels to try to forecast the next month of my data.
I have daily data:
First I run autoarima to see what variables I have to put in my Sarimax model:
auto_arima(df['occurrences'],seasonal=True,m=7).summary()
and I obtain this results:
So, now I split the dataset in train and test data. I want to try to predict the next month, so I do:
train = df.loc[:'2020-04-30']
test = df.loc['2020-05-01':]
I train the model
model = SARIMAX(df['occurrences'],order=(1, 1, 1))
results = model.fit()
results.summary()
start=len(train)
end=len(train)+len(test)
predictions = results.predict(start=start, end=end, dynamic=False, typ='levels')
But now when I plot the predictions I can see how the predictions are one day advanced:
ax = test['occurrences'].plot(legend=True,figsize=(12,6),title=title)
predictions.plot(legend=True)
ax.autoscale(axis='x',tight=True)
ax.set(xlabel=xlabel, ylabel=ylabel);
If I use the shift comand to asing all the predictions one day before:
ax = test['occurrences'].plot(legend=True,figsize=(12,6),title=title)
predictions.shift(-1).dropna().plot(legend=True)
ax.autoscale(axis='x',tight=True)
ax.set(xlabel=xlabel, ylabel=ylabel);
You can see how now are fitted in the correct day, why is happening this?