This seems to be one of the most common questions about LSTMs in PyTorch, but I am still unable to figure out what should be the input shape to PyTorch LSTM.
Even after following several posts (1, 2, 3) and trying out the solutions, it doesn't seem to work.
Background: I have encoded text sequences (variable length) in a batch of size 12 and the sequences are padded and packed using pad_packed_sequence
functionality. MAX_LEN
for each sequence is 384 and each token (or word) in the sequence has a dimension of 768. Hence my batch tensor could have one of the following shapes: [12, 384, 768]
or [384, 12, 768]
.
The batch will be my input to the PyTorch rnn module (lstm here).
According to the PyTorch documentation for LSTMs, its input dimensions are (seq_len, batch, input_size)
which I understand as following.seq_len
- the number of time steps in each input stream (feature vector length).batch
- the size of each batch of input sequences.input_size
- the dimension for each input token or time step.
lstm = nn.LSTM(input_size=?, hidden_size=?, batch_first=True)
What should be the exact input_size
and hidden_size
values here?