State resetting in LSTMs during training and testing

Question

I am trying to understand and implement LSTMs. I understand that they one needs to define a sequence length T, and the training is performed in batches. So we fed to the network several sequences of length T. Now the LSTM needs a previous state as input, which as I understand, it is initialized to zero. My question is, is the state reset to zero after every sequence? for example I have a sequence 1, the state vector is carried forward in this sequence, and then I put it to zero for the next sequence? Or is it carried it to the next sequence 2? If so, How is it done for unrelated sequences; for example I have samples from 2 different texts, and It would not make sense to carry the state from text 1 to text 2; how is this handled in practice? About the testing time, the state vector is initialized as zero and carried for the whole sequence, or is it reset after each sub-sequence?

Note: I put tag this also in Tensorflow since is the framework that I am using, and maybe someone from there can help me also.

guinny guinny · Accepted Answer · 2017-01-23T02:50:16

In Tensorflow, I am 95% sure the starting state for every sequence is reset to zero for every element in your batch and between batches. (5% because the "Never say never" rule :)

EDIT:

I should probably elaborate more. How Tensorflow works, it first constructs a graph and then pushes your data through. When you look at the recurrent graph you constructed, I believe that you'll see its head (first state) is connected to zero, which means every time you push data through the graph (e.g. through sess.run()), it will get a new zero from the zero generator, hence its old state from previous runs if forgotten.

State resetting in LSTMs during training and testing

2 Answers