
I'm trying to practice with LSTM and Pytorch. I took IMDB movie review dataset to predict whether the review is positive or negative. I use 80% of the dataset for my training, remove punctuations, use GloVe (with 200 dims) as an embedding layer.

Before training, I also exclude too short (reviews with length smaller than 50 symbols) and too long (reviews with longer than 1000 symbols) reviews.

For the LSTM layer I use hidden dimension 256, num_layers 2 and one directional parameters with 0.5 dropout. Afterwards, I have fully connected layer. For the training I used nn.BCELoss function with Adam optimizer (lr=0.001).

Currently I get 85% validation accuracy with 98% training accuracy after 7 epochs. I did following steps for preventing overfitting and getting higher accuracy:

  • used weight_decay for Adam optimizer,
  • tried SGD (lr=0.1, 0.001) instead of Adam,
  • tried to increase num_layers of LSTM,

In all of these cases model didn't learn at all, giving 50% of accuracy for both training and validation sets.

class CustomLSTM(nn.Module):
    def __init__(self, vocab_size, use_embed=False, embed=None, embedding_size=200, hidden_size=256,
                 num_lstm_layers=2, bidirectional=False, dropout=0.5, output_dims=2):

        self.vocab_size = vocab_size
        self.embedding_size = embedding_size
        self.hidden_size = hidden_size
        self.num_lstm_layers = num_lstm_layers
        self.bidirectional = bidirectional
        self.dropout = dropout

        self.embedding = nn.Embedding(vocab_size, embedding_size)
        if use_embed:
#             self.embedding.requires_grad = False
        self.lstm = nn.LSTM(input_size=embedding_size,
#         print('output dims value ', output_dims)
        self.drop_fc = nn.Dropout(0.5)
        self.fc = nn.Linear(hidden_size, output_dims)
        self.sig = nn.Sigmoid()

I want to understand:

  1. Why the model doesn't learn at all with that changes applied?
  2. How to increase the accuracy?

1 Answers


You could try to put attention after LSTM layers. I have tried it before with the same problem.

BiLSTM with Attention Based Sentiment Analysis

The other option, You can consider to use other architecture like CNN combine with ensemble technique (it work great for me).

Distinguish Positive and Negative Documents