I am trying to understand and implement LSTMs. I understand that they one needs to define a sequence length T, and the training is performed in batches. So we fed to the network several sequences of length T. Now the LSTM needs a previous state as input, which as I understand, it is initialized to zero. My question is, is the state reset to zero after every sequence? for example I have a sequence 1, the state vector is carried forward in this sequence, and then I put it to zero for the next sequence? Or is it carried it to the next sequence 2? If so, How is it done for unrelated sequences; for example I have samples from 2 different texts, and It would not make sense to carry the state from text 1 to text 2; how is this handled in practice? About the testing time, the state vector is initialized as zero and carried for the whole sequence, or is it reset after each sub-sequence?
Note: I put tag this also in Tensorflow since is the framework that I am using, and maybe someone from there can help me also.