1
votes

I have question regarding the computation made by the Categorical Cross Entropy Loss from Pytorch. I have made this easy code snippet and because I use the argmax of the output tensor as the targets, I cannot understand why the loss is still high.

import torch
import torch.nn as nn
ce_loss = nn.CrossEntropyLoss()
output = torch.randn(3, 5, requires_grad=True)
targets = torch.argmax(output, dim=1)
loss = ce_loss(outputs, targets)
print(loss)

Thanks for the help understanding it. Best regards Jerome

1
what do you mean high ? Look at my answer to see how loss can be computed. - Nebiyou Yismaw

1 Answers

3
votes

So here is a sample data from your code with the output, label and loss having the following values

outputs =  tensor([[ 0.5968, -0.8249,  1.5018,  2.7888, -0.6125],
                   [-1.1534, -0.4921,  1.0688,  0.2241, -0.0257],
                   [ 0.3747,  0.8957,  0.0816,  0.0745,  0.2695]], requires_grad=True)requires_grad=True)

labels = tensor([3, 2, 1])
loss = tensor(0.7354, grad_fn=<NllLossBackward>)

So let's examine the values,

If you compute the softmax output of your logits (outputs), using something like this torch.softmax(outputs,axis=1) you will get

probs = tensor([[0.0771, 0.0186, 0.1907, 0.6906, 0.0230],
                [0.0520, 0.1008, 0.4801, 0.2063, 0.1607],
                [0.1972, 0.3321, 0.1471, 0.1461, 0.1775]], grad_fn=<SoftmaxBackward>)

So these will be your prediction probabilities.

Now cross-entropy loss is nothing but a combination of softmax and negative log likelihood loss. Hence, your loss can simply be computed using

loss = (torch.log(1/probs[0,3]) +  torch.log(1/probs[1,2]) + torch.log(1/probs[2,1])) / 3

, which is the average of the negative log of the probabilities of your true labels. The above equation evaluates to 0.7354, which is equivalent to the value returned from the nn.CrossEntropyLoss module.