Dimension of hidden layer LSTM Pytorch

Question

I was reading the implementation of LSTM in Pytorch. The code goes like this:

lstm = nn.LSTM(3, 3)  # Input dim is 3, output dim is 3
inputs = [torch.randn(1, 3) for _ in range(5)]  # make a sequence of length 5

# initialize the hidden state.
hidden = (torch.randn(1, 1, 3),
          torch.randn(1, 1, 3))
for i in inputs:
    # Step through the sequence one element at a time.
    # after each step, hidden contains the hidden state.
    out, hidden = lstm(i.view(1, 1, -1), hidden)

I don't understand why the hidden state is defined by a tuple of two tensors instead of one? Since the hidden layer is simply a layer of the feed-forward neural network which is a vector.

Umang Gupta Umang Gupta · Accepted Answer · 2018-07-23T17:16:48

Apart from the hidden state, LSTM also has cell state, C. Therefore, a tuple is passed I think. See https://pytorch.org/docs/stable/nn.html#lstmcell.

If you don't pass C, it is taken to be all zeros.

Note that this is the case for LSTM, GRU or RNN do not have C.

Dimension of hidden layer LSTM Pytorch

1 Answers