The role of sequence length in LSTMs

Question

I have more of an abstract question about LSTMs.

So I have time series data, a long series of let's say 30.000 data points.

For LSTMs in Tensorflow, the input shape is [batch_size, time_steps, features]. My batch size is equal to 1 since I only have one time series.

Now my question is about the role of the sequence length.

I could pass the entire time series into a tensorflow LSTM using timestep=1, in which case the entire time series would be plugged in one by one into the LSTM. Or I could use some >1 number of timesteps and make it sequence-to-sequence model.

This way, I could plug in 7 datapoints at once (= a week of data) and predict 7 outputs (= one week into the future).

The thing is, we know that the hidden state of the LSTM will eventually remember the entire dataset (if we do enough epochs of training). So what is the difference between

a) timesteps=1, I predict one period ahead, and plug the prediction back into the neural net 7 times,

b) timesteps=7, I predict a sequence of 7 (=one week).

mynameisvinn mynameisvinn · Accepted Answer · 2017-11-11T22:45:19

in practical terms, seq2seq (ie your second example) are far more difficult to train. in anthropomorphic terms, your lstm will face considerable pressure to encode all of the inputs into a single vector, a process that discards quite a bit of information, and somehow decode that single vector into 7 outputs in the correct sequential order.

The role of sequence length in LSTMs

1 Answers