7
votes

Correct me if I am wrong but according to the official Keras documentation, by default, the fit function has the argument 'shuffle=True', hence it shuffles the whole training dataset on each epoch.

However, the point of using recurrent neural networks such as LSTM or GRU is to use the precise order of each data so that the state of the previous data influence the current one.

If we shuffle all the data, all the logical sequences are broken. Thus I don't understand why there are so much examples of LSTM where the argument is not set to False. What is the point of using RNN without sequences ?

Also, when I set the shuffle option to False, my LSTM model is less performant eventhought there are dependencies between the data: I use the KDD99 dataset where the connections are linked.

1

1 Answers

11
votes

If we shuffle all the data, all the logical sequences are broken.

No, the shuffling happens on the batches axis, not on the time axis. Usually, your data for an RNN has a shape like this: (batch_size, timesteps, features)

Usually, you give your network not only one sequence to learn from, but many sequences. Only the order in which these many sequences are being trained on gets shuffled. The sequences themselves stay intact. Shuffling is usually always a good idea because your network shall only learn the training examples themselves, not their order.

This being said, there are cases where you have indeed only one huge sequence to learn from. In that case you have the option to still divide your sequence into several batches. If this is the case, you are absolutely right with your concern that shuffling would have a huge negative impact, so don't do that in this case!

Note: RNNs have a stateful parameter that you can set to True. In that case the last state of the previous batch will be passed to the following one which effectively makes your RNN see all batches as one huge sequence. So, absolutely do this, if you have a huge sequence over multiple batches.