I have more of an abstract question about LSTMs.
So I have time series data, a long series of let's say 30.000 data points.
For LSTMs in Tensorflow, the input shape is [batch_size, time_steps, features]. My batch size is equal to 1 since I only have one time series.
Now my question is about the role of the sequence length.
I could pass the entire time series into a tensorflow LSTM using timestep=1, in which case the entire time series would be plugged in one by one into the LSTM. Or I could use some >1 number of timesteps and make it sequence-to-sequence model.
This way, I could plug in 7 datapoints at once (= a week of data) and predict 7 outputs (= one week into the future).
The thing is, we know that the hidden state of the LSTM will eventually remember the entire dataset (if we do enough epochs of training). So what is the difference between
a) timesteps=1, I predict one period ahead, and plug the prediction back into the neural net 7 times,
b) timesteps=7, I predict a sequence of 7 (=one week).