2
votes

I'm training a LSTM model using pytorch with batch size of 256 and NLLLoss() as loss function. The loss function is having problem with the data shape.

The softmax output from the forward passing has shape of torch.Size([256, 4, 1181]) where 256 is batch size, 4 is sequence length, and 1181 is vocab size.

The target is in the shape of torch.Size([256, 4]) where 256 is batch size and 4 is the output sequence length.

When I was testing earlier with batch size of 1, the model works fine but when I add batch size, it is breaking. I read that NLLLoss() can take class target as input instead of one hot encoded target.

Am I misunderstanding it? Or did I not format the shape of the target correctly?

class LSTM(nn.Module):

    def __init__(self, embed_size=100, hidden_size=100, vocab_size=1181, embedding_matrix=...):
        super(LSTM, self).__init__()
        self.hidden_size = hidden_size
        self.word_embeddings = nn.Embedding(vocab_size, embed_size)
        self.word_embeddings.load_state_dict({'weight': torch.Tensor(embedding_matrix)})
        self.word_embeddings.weight.requires_grad = False
        self.lstm = nn.LSTM(embed_size, hidden_size)
        self.hidden2out = nn.Linear(hidden_size, vocab_size)


    def forward(self, tokens):
        batch_size, num_steps = tokens.shape
        embeds = self.word_embeddings(tokens)
        lstm_out, _ = self.lstm(embeds.view(batch_size, num_steps, -1))
        out_space = self.hidden2out(lstm_out.view(batch_size, num_steps, -1))
        out_scores = F.log_softmax(out_space, dim=1)
        return out_scores

model = LSTM(self.config.embed_size, self.config.hidden_size, self.config.vocab_size, self.embedding_matrix)
loss_function = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=self.config.lr)

Error:

~/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
   1846         if target.size()[1:] != input.size()[2:]:
   1847             raise ValueError('Expected target size {}, got {}'.format(
-> 1848                 out_size, target.size()))
   1849         input = input.contiguous().view(n, c, 1, -1)
   1850         target = target.contiguous().view(n, 1, -1)

ValueError: Expected target size (256, 554), got torch.Size([256, 4])
1
Your way of use seems correct. Better add your exact error.akshayk07
@akshayk07 added.TYZ
model = LSTM(self.config.embed_size, self.config.hidden_size, self.config.vocab_size, self.embedding_matrix) -> In this line, can you use the proper sizes directly? Might be causing the issue.akshayk07
@TYZ Check the docs for nn.NLLLoss describing shape again carefully. If input is (N,C,d) then target should be (N,d). In your case it looks like input is (N,d,C) so transposing dims 1 and 2 of input may fix your issue.jodag
@TYZ Sorry I'm referring to the input to the loss function, i.e. the network output. I was just commenting that if x is the (N,d,C) network output and y is the (N,d) targets then loss_function(x.reshape(N*d,C), y.reshape(N*d)) is the same as loss_function(x.transpose(1,2), y), at least for NLLLoss.jodag

1 Answers

3
votes

Your input shape to the loss function is (N, d, C) = (256, 4, 1181) and your target shape is (N, d) = (256, 4), however, according to the docs on NLLLoss the input should be (N, C, d) for a target of (N, d).

Supposing x is your network output and y is the target then you can compute loss by transposing the incorrect dimensions of x as follows:

loss = loss_function(x.transpose(1, 2), y)

Alternatively since NLLLoss is just averaging all the responses anyway, you avoid creating copies of data by just reshaping x and y into (N*d, C) and (N*d) tensors and get the same result:

loss = loss_function(x.reshape(N*d, C), y.reshape(N*d))