I'm looking into Tensorflow text_generation tutorial (https://www.tensorflow.org/tutorials/text/text_generation) and wondering why they are shuffle training data although the stateful for GRU layer is set TRUE?
This contradicts the documentation (https://www.tensorflow.org/api_docs/python/tf/keras/layers/RNN): "Note on using statefulness in RNNs: You can set RNN layers to be 'stateful', which means that the states computed for the samples in one batch will be reused as initial states for the samples in the next batch. This assumes a one-to-one mapping between samples in different successive batches."
Code snippets from tutorial:
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
tf.keras.layers.GRU(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform')