0
votes

I have a time series dataset. I am using python, pandas and statsmodels to try to forecast the next month of my data.

I have daily data:

enter image description here

First I run autoarima to see what variables I have to put in my Sarimax model:

auto_arima(df['occurrences'],seasonal=True,m=7).summary()

and I obtain this results:

enter image description here

So, now I split the dataset in train and test data. I want to try to predict the next month, so I do:

train = df.loc[:'2020-04-30']
test = df.loc['2020-05-01':]

I train the model

model = SARIMAX(df['occurrences'],order=(1, 1, 1))
results = model.fit()
results.summary()
start=len(train)
end=len(train)+len(test)
predictions = results.predict(start=start, end=end, dynamic=False, typ='levels')

But now when I plot the predictions I can see how the predictions are one day advanced:

ax = test['occurrences'].plot(legend=True,figsize=(12,6),title=title)
predictions.plot(legend=True)
ax.autoscale(axis='x',tight=True)
ax.set(xlabel=xlabel, ylabel=ylabel);

enter image description here

If I use the shift comand to asing all the predictions one day before:

ax = test['occurrences'].plot(legend=True,figsize=(12,6),title=title)
predictions.shift(-1).dropna().plot(legend=True)
ax.autoscale(axis='x',tight=True)
ax.set(xlabel=xlabel, ylabel=ylabel);

enter image description here

You can see how now are fitted in the correct day, why is happening this?

1

1 Answers

1
votes

The model is giving you the correct predictions on the correct days. ARIMA models are relatively simple and predict the future based on the present and past. So when the model sees a large value today (for example in observation 11), its prediction for tomorrow is larger.

For example, see this StackExchange question and answer: https://stats.stackexchange.com/questions/330928/time-series-prediction-shifted