0
votes

I'm using PyTorch with a training set of movie reviews each labeled positive or negative. Every review is truncated or padded to be 60 words and I have a batch size of 32. This 60x32 Tensor is fed to an embedding layer with an embedding dim of 100 resulting in a 60x32x100 Tensor. Then I use the unpadded lengths of each review to pack the embedding output, and feed that to a BiLSTM layer with hidden dim = 256.

I then pad it back, apply a transformation (to try to get the last hidden state for the forward and backward directions) and feed the transformation to a Linear layer which is 512x1. Here is my module, I pass the final output through a sigmoid not shown here

class RNN(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers, 
                 bidirectional, dropout, pad_idx):
        
        super().__init__()
        self.el = nn.Embedding(vocab_size, embedding_dim)
        print('vocab size is ', vocab_size)
        print('embedding dim is ', embedding_dim)
        self.hidden_dim = hidden_dim
        self.num_layers = n_layers # 2
        self.lstm = nn.LSTM(input_size=embedding_dim, hidden_size=hidden_dim, num_layers=n_layers, dropout=dropout, bidirectional=bidirectional)
        # Have an output layer for outputting a single output value
        self.linear = nn.Linear(2*hidden_dim, output_dim)

    def init_hidden(self):
        return (torch.zeros(self.n_layers*2, 32, self.hidden_dim).to(device), 
                torch.zeros(self.n_layers*2, 32, self.hidden_dim).to(device))
        
    def forward(self, text, text_lengths):
        print('input text size ', text.size())
        embedded = self.el(text)
        print('embedded size ', embedded.size())
        packed_seq = torch.nn.utils.rnn.pack_padded_sequence(embedded, lengths=text_lengths, enforce_sorted=False)
        packed_out, (ht, ct) = self.lstm(packed_seq, None)
        out_rnn, out_lengths = torch.nn.utils.rnn.pad_packed_sequence(packed_out)
        print('padded lstm out ', out_rnn.size())        
        #out_rnn = out_rnn[-1] #this works
        #out_rnn = torch.cat((out_rnn[-1, :, :self.hidden_dim], out_rnn[0, :, self.hidden_dim:]), dim=1) # this works
        out_rnn = torch.cat((ht[-1], ht[0]), dim=1) #this works
        #out_rnn = out_rnn[:, -1, :] #doesn't work maybe should
        print('attempt to get last hidden ', out_rnn.size())
        linear_out = self.linear(out_rnn)
        print('after linear ', linear_out.size())
        return linear_out

I've tried 3 different transformations to get the dimensions correct for the linear layer

out_rnn = out_rnn[-1] #this works
out_rnn = torch.cat((out_rnn[-1, :, :self.hidden_dim], out_rnn[0, :, self.hidden_dim:]), dim=1) # this works
out_rnn = torch.cat((ht[-1], ht[0]), dim=1) #this works

These all produce an output like this

input text size torch.Size([60, 32])

embedded size torch.Size([60,32, 100])

padded lstm out torch.Size([36, 32, 512])

attempt to get last hidden torch.Size([32, 512])

after linear torch.Size([32, 1])

I would expect the padded lstm out to be [60, 32, 512] but it is always less than 60 in the first dimension.

I'm training for 10 epochs with optim.SGD and nn.BCEWithLogitsLoss(). My training accuracy is always around 52% and test accuracy is always at like 50%, so the model is doing no better than randomly guessing. I'm sure that my data is being handled correctly in my tochtext.data.Dataset. Am I forwarding my tensors along incorrectly?

I have tried using batch_first=True in my lstm, packed_seq function, and pad_packed_seq function and that breaks my transformations before feeding to the linear layer.

Update I added the init_hidden method and have tried without the pack/pad sequence methods and still get the same results

1
Unclear here but did you zero out the hidden states at each iteration? Cause you model class missing a typical init_hidden() method for LSTM networks. Another culprit might be the pack-pad functions? I would try without them first to make sure everything works.neurite
I added init_hidden and tried without the pack/pad functions and still get the same results. Is there a correct method for getting the last hidden state out of the 3 possible transformations I'm doing between the lstm and linear layer? All 3 give about the same resultsgary69
According to the doc of pad_packed_sequence, returned tensor is "T x B x *, where T is the length of the longest sequence". My interpretation is that T is the longest length within the batch. That would explain why it is always <= 60. There is the optional total_length to pad it to a fixed length.neurite
And the PyTorch doc of pad_packed_sequence says the output tensor "Batch elements will be ordered decreasingly by their length." So when computing the loss, did you restore the original order of the batch?neurite
Thank you, I did not restore the original order. Currently I am running it without using the pack/pad functions and getting 50% accuracy. I'm trying to get the accuracy up without those functions first and then I will add them backgary69

1 Answers

0
votes

I changed my optimizer from SGD to Adam and changed layers from 2 to 1 and my model started to learn, getting accuracies > 75%