I am new to LSTMs and going through the Understanding Keras LSTMs and had some silly doubts related to a beautiful answer by Daniel Moller.
Here are some of my doubts:
There are 2 ways specified under the
Achieving one to many
section where it’s written that we can usestateful=True
to recurrently take the output of one step and serve it as the input of the next step (needs output_features == input_features).In the
One to many with repeat vector
diagram, the repeated vector is fed as input in all the time-step, whereas in theOne to many with stateful=True
the output is fed as input in the next time step. So, aren't we changing the way the layers work by using thestateful=True
?Which of the above 2 approaches (using the repeat vector OR feeding the previous time-step output as the next input) should be followed when building an RNN?
Under the
One to many with stateful=True
section, to change the behaviour ofone to many
, in the code for manual loop for prediction, how will we know thesteps_to_predict
variable because we don't know the ouput sequence length in advance.I also did not understand the way the entire model is using the
last_step output
to generate thenext_step ouput
. It has confused me about the working ofmodel.predict()
function. I mean, doesn'tmodel.predict()
simultaneously predict the entire output sequences at once rather than looping through theno. of output sequences
(whose value I still don't know) to be generated and doingmodel.predict()
to predict a specific time-step output in a given iteration?I couldn't understand the entire of
Many to many
case. Any other link would be helpful.I understand that we use
model.reset_states()
to make sure that a new batch is independent of the previous batch. But, Do we manually create batches of sequence such that one batch follows another batch or doesKeras
instateful=True
mode automatically divides the sequence into such batches.If it's done manually then, why would anyone divide the dataset into such batches in which a part of a sequence is in one batch and the other in the next batch?
At last, what are the practical implementation or examples/use-cases where
stateful=True
would be used(because this seems to be something unusual)? I am learning LSTMs and this is the first time I've been introduced tostateful
in Keras.
Can anyone help me in explaining my silly questions so that I can be clear on LSTM implementation in Keras?
EDIT: Asking some of these for clarification of the current answer and some for the remaining doubts
A. So, basically stateful lets us keep OR reset
the inner state after every batch. Then, how would the model learn if we keep on resetting the inner state again and again after each batch trained? Does resetting truely means resetting the parameters(used in computing the hidden state)?
B. In the line If stateful=False: automatically resets inner state, resets last output step
. What did you mean by resetting the last output step? I mean, if every time-step produces its own output then what does resetting of last output step mean and that too only the last one?
C. In response to Question 2
and 2nd point of Question 4
, I still didn't get your manipulate the batches between each iteration
and the need of stateful
((last line of Question 2
) which only resets the states). I got to the point that we don't know the input for every output generated in a time-step.
So, you break the sequences into sequences of only one-step
and then use new_step = model.predict(last_step)
but then how do you know about how long do you need to do this again and again(there must be a stopping point for the loop)? Also, do explain the stateful
part( in the last line of Question 2
).
D. In the code under One to many with stateful=True
, it seems that the for loop(manual loop) is used for predicting the next word is used just in test time. Does the model incorporates that thing itself at train time or do we manually
need use this loop also at the train time?
E. Suppose we are doing some machine translation job, I think the breaking of sequences will occur after the entire input(language to translate) has been fed to the input time-steps and then generation of outputs(translated language) at each time-step is going to take place via the manual loop
because now we are ended up with the inputs and starting to produce output at each time-step using the iteration. Did I get it right?
F. As the default working of LSTMs requires 3 things mentioned in the answer, so in case of breaking of sequences, are current_input
and previous_output
fed with same vectors because their value in case of no current input being available is same?
G. Under the many to many with stateful=True under the Predicting: section, the code reads:
predicted = model.predict(totalSequences)
firstNewStep = predicted[:,-1:]
Since, the manual loop of finding the very next word in the current sequence
hasn't been used up till now, how do I know the count
of the time-steps that has been predicted by the model.predict(totalSequences)
so that the last step from predicted(predicted[:,-1:]
) will then later be used for generating the rest of the sequences? I mean, how do I know the number of sequences that have been produced in the predicted = model.predict(totalSequences)
before the manual for loop
(later used).
EDIT 2:
I. In D
answer I still didn't get how will I train my model? I understand that using the manual loop(during training) can be quite painful but then if I don't use it how will the model get trained in the circumstances where we want the 10 future steps, we cannot output them at once because we don't have the necessary 10 input steps
? Will simply using model.fit()
solve my problem?
II. D
answer's last para, You could train step by step using train_on_batch only in the case you have the expected outputs of each step. But otherwise I think it's very complicated or impossible to train.
.
Can you explain this in more detail?
What does step by step
mean? If I don't have OR have the output for the later sequences , how will that affect my training? Do I still need the manual loop during training. If not, then will the model.fit()
function work as desired?
III. I interpreted the "repeat" option
as using the repeat vector
. Wouldn't using the repeat vector be just good for the one to many
case and not suitable for the many to many
case because the latter will have many input vectors to choose from(to be used as a single repeated vector) ? How will you use the repeat vector
for the many to many
case?