Misleading training data shuffle for stateful GRU in Tensorflow text_generation tutorial

Question

I'm looking into Tensorflow text_generation tutorial (https://www.tensorflow.org/tutorials/text/text_generation) and wondering why they are shuffle training data although the stateful for GRU layer is set TRUE?

This contradicts the documentation (https://www.tensorflow.org/api_docs/python/tf/keras/layers/RNN): "Note on using statefulness in RNNs: You can set RNN layers to be 'stateful', which means that the states computed for the samples in one batch will be reused as initial states for the samples in the next batch. This assumes a one-to-one mapping between samples in different successive batches."

Code snippets from tutorial:

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

tf.keras.layers.GRU(rnn_units,
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform')

You are right, when the stateful is set to True, there is no need to shuffle the data. Since the stateful RNN makes sense when you use sequential and nonoverlapping input sequences. And also while creating batches instead of sequence length we could chop the entire text into n equal length where n is batch size, to create one dataset of consecutive input sequences for each of the batches. — Sachin Prasad H S
I also noticed this. I thought that maybe it was a typo because they later go on to use statefulness when predicting, which is a valid use of it. — neuroguy123

asaf92 asaf92 · Accepted Answer · 2020-11-29T13:20:19

The documentation is wrong. I follow the steps from the TensorFlow documentation, but when I set stateful=False I get much better results with shuffled data.

Misleading training data shuffle for stateful GRU in Tensorflow text_generation tutorial

1 Answers