0
votes

I am trying to understand and implement LSTMs. I understand that they one needs to define a sequence length T, and the training is performed in batches. So we fed to the network several sequences of length T. Now the LSTM needs a previous state as input, which as I understand, it is initialized to zero. My question is, is the state reset to zero after every sequence? for example I have a sequence 1, the state vector is carried forward in this sequence, and then I put it to zero for the next sequence? Or is it carried it to the next sequence 2? If so, How is it done for unrelated sequences; for example I have samples from 2 different texts, and It would not make sense to carry the state from text 1 to text 2; how is this handled in practice? About the testing time, the state vector is initialized as zero and carried for the whole sequence, or is it reset after each sub-sequence?

Note: I put tag this also in Tensorflow since is the framework that I am using, and maybe someone from there can help me also.

2

2 Answers

0
votes

In Tensorflow, I am 95% sure the starting state for every sequence is reset to zero for every element in your batch and between batches. (5% because the "Never say never" rule :)

EDIT:

I should probably elaborate more. How Tensorflow works, it first constructs a graph and then pushes your data through. When you look at the recurrent graph you constructed, I believe that you'll see its head (first state) is connected to zero, which means every time you push data through the graph (e.g. through sess.run()), it will get a new zero from the zero generator, hence its old state from previous runs if forgotten.

0
votes

It depends on how you implement batch processing of RNN. Theoretically, the state is very important in processing the series of the data so, you should not reset the state if the whole sequence of data is not finished. So, usually you need to reset when the epoch finishes and you should not reset when just one batch cycle finishes.