I'm trying to practice with LSTM and Pytorch. I took IMDB movie review dataset to predict whether the review is positive or negative. I use 80% of the dataset for my training, remove punctuations, use GloVe
(with 200 dims) as an embedding layer.
Before training, I also exclude too short (reviews with length smaller than 50 symbols) and too long (reviews with longer than 1000 symbols) reviews.
For the LSTM
layer I use hidden dimension 256
, num_layers 2
and one directional
parameters with 0.5 dropout
. Afterwards, I have fully connected layer.
For the training I used nn.BCELoss function with Adam optimizer (lr=0.001
).
Currently I get 85% validation accuracy with 98% training accuracy after 7 epochs. I did following steps for preventing overfitting and getting higher accuracy:
- used weight_decay for Adam optimizer,
- tried SGD (lr=0.1, 0.001) instead of Adam,
- tried to increase num_layers of LSTM,
In all of these cases model didn't learn at all, giving 50% of accuracy for both training and validation sets.
class CustomLSTM(nn.Module):
def __init__(self, vocab_size, use_embed=False, embed=None, embedding_size=200, hidden_size=256,
num_lstm_layers=2, bidirectional=False, dropout=0.5, output_dims=2):
super().__init__()
self.vocab_size = vocab_size
self.embedding_size = embedding_size
self.hidden_size = hidden_size
self.num_lstm_layers = num_lstm_layers
self.bidirectional = bidirectional
self.dropout = dropout
self.embedding = nn.Embedding(vocab_size, embedding_size)
if use_embed:
self.embedding.weight.data.copy_(torch.from_numpy(embed))
# self.embedding.requires_grad = False
self.lstm = nn.LSTM(input_size=embedding_size,
hidden_size=hidden_size,
num_layers=num_lstm_layers,
batch_first=True,
dropout=dropout,
bidirectional=bidirectional)
# print('output dims value ', output_dims)
self.drop_fc = nn.Dropout(0.5)
self.fc = nn.Linear(hidden_size, output_dims)
self.sig = nn.Sigmoid()
I want to understand:
- Why the model doesn't learn at all with that changes applied?
- How to increase the accuracy?