3
votes

Currently I am learning RNN, especially LSTM networks. I have read a lot of topics, including this one and I still have some misunderstandings. The image below is from this article and it represents single RNN cell unfolded in time. enter image description here 1. Do I understand correctly, that RNN cell is not a single neuron in terms of Feedforward neural networks, but a single layer of neurons, that are inside of it?

Another image from article represents single LSTM cell unfolded in time. enter image description here 2. Based on logic of the first question, is LSTM cell not a single neuron in terms of Feedforward neural networks, but a set of 4 layers of neurons, that are inside of it?

3. Roughly saying, can we say, that RNN (or LSTM) layer (e.g. in terms of Keras layers) is what we call 'cell'?

Thanks in advance for your answers!

1
a set of RNN/LSTM/GRU cells that are bound together (each uses the last cell's output) comprises a single RNN layer/network. So roughly speaking, cells can be perceived as neurons in FFNN terminology, although their function is different.inverted_index
@inverted_index, thank you for your answer. What do you mean by "a set of RNN/LSTM/GRU cells that are bound together"? Is it a set of cells at the current timestep or is it a set of 'copies' of one cell through time?Андрей Диденко

1 Answers

2
votes

Considering the diagrams that you have posted, as you see, each cell uses the output of its precursor cell. For example, when you want to feed in x2 into your LSTM network, you will have to use h1 from the previous cell (i.e., the output from previous timestep) along with the vector for x2. Feeding these two will give you h2, which will then be propagated forward to the next cell. This is an example of what is going on in timestep t=2.

A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. In the tutorials though, you see that these networks are unrolled for the sake of understandability. This is not exactly what's happening in practice, since cells that are pictured are not separate because they all have the same parameters which get updated with each backpropagation iteration.

To make it more understandable, consider the code snippet below.

# X is the input sequence (e.g., word embeddings vectors)
# steps is the input sequence length
# h0, and c0 are zero state vector (commonly done) that you want to 
# feed into the first RNN cell
# h_out is the hidden states that the RNN network outputs


X = torch.randn(seq_len, hidden_dim) 
steps = range(seq_len)
h0 = torch.zeros(seq_len, hidden_dim)
c0 = torch.zeros(seq_len, hidden_dim)
hidden = (h0, c0)
h_out = list()

for i in steps:
    # advance rnn
    hidden = RNN(X[i], hidden)
    hy, cy = hidden
    h_out.append(hy)

Suppose that RNN(.,.) is an RNN (LSTM/GRU) cell that has a bunch of trainable parameters such as weight matrices and biases. These parameters are all the same and being learned by each X[i] and hidden instances that are fed into the RNN cell at each iteration.

So back to your question, an RNN network is actually a multiple copies of RNN cells that gets trained as you proceed training.