Considering the diagrams that you have posted, as you see, each cell uses the output of its precursor cell. For example, when you want to feed in x2
into your LSTM network, you will have to use h1
from the previous cell (i.e., the output from previous timestep) along with the vector for x2
. Feeding these two will give you h2
, which will then be propagated forward to the next cell. This is an example of what is going on in timestep t=2
.
A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. In the tutorials though, you see that these networks are unrolled for the sake of understandability. This is not exactly what's happening in practice, since cells that are pictured are not separate because they all have the same parameters which get updated with each backpropagation iteration.
To make it more understandable, consider the code snippet below.
# X is the input sequence (e.g., word embeddings vectors)
# steps is the input sequence length
# h0, and c0 are zero state vector (commonly done) that you want to
# feed into the first RNN cell
# h_out is the hidden states that the RNN network outputs
X = torch.randn(seq_len, hidden_dim)
steps = range(seq_len)
h0 = torch.zeros(seq_len, hidden_dim)
c0 = torch.zeros(seq_len, hidden_dim)
hidden = (h0, c0)
h_out = list()
for i in steps:
# advance rnn
hidden = RNN(X[i], hidden)
hy, cy = hidden
h_out.append(hy)
Suppose that RNN(.,.)
is an RNN (LSTM/GRU) cell that has a bunch of trainable parameters such as weight matrices and biases. These parameters are all the same and being learned by each X[i]
and hidden
instances that are fed into the RNN cell at each iteration.
So back to your question, an RNN network is actually a multiple copies of RNN cells that gets trained as you proceed training.