1
votes

I am experimenting with some of the pytorch codes. With cross entropy loss I found some interesting results and I have used both binary cross entropy loss and cross entropy loss of pytorch.

import torch
import torch.nn as nn

X = torch.tensor([[1,0],[1,0],[0,1],[0,1]],dtype=torch.float)
softmax = nn.Softmax(dim=1)


bce_loss = nn.BCELoss()
ce_loss= nn.CrossEntropyLoss()

pred = softmax(X)

bce_loss(X,X) # tensor(0.)
bce_loss(pred,X) # tensor(0.3133)
bce_loss(pred,pred) # tensor(0.5822)

ce_loss(X,torch.argmax(X,dim=1)) # tensor(0.3133)

I expected the cross entropy loss for the same input and output to be zero. Here X, pred and torch.argmax(X,dim=1) are same/similar with some transformations. This reasoning only worked for bce_loss(X,X) # tensor(0.) where-else all other resulted in a loss greater than zero. I speculated the output for bce_loss(pred,X), bce_loss(pred,pred) and ce_loss(X,torch.argmax(X,dim=1)) should be zero.

What is the mistake here?

2

2 Answers

2
votes

The reason that you are seeing this is because nn.CrossEntropyLoss accepts logits and targets, a.k.a X should be logits, but is already between 0 and 1. X should be much bigger, because after softmax it will go between 0 and 1.

ce_loss(X * 1000, torch.argmax(X,dim=1)) # tensor(0.)

nn.CrossEntropyLoss works with logits, to make use of the log sum trick.

The way you are currently trying after it gets activated, your predictions become about [0.73, 0.26].

Binary cross entropy example works since it accepts already activated logits. By the way, you probably want to use nn.Sigmoid for activating binary cross entropy logits. For the 2-class example, softmax is also ok.

0
votes

I have posted a manual implementation of cross entropy and NLLLoss here as an answer to related pytorch CrossEntropyLoss question. It may not be perfect, but do check it out.

Edit: I did not include the code in my previous post, so the post was deleted. Following the given suggestion, part of the code (directly copied from the link above) to compute the CrossEntropyLoss is the following:

def compute_crossentropyloss_manual(x,y0):
    """
    x is the vector of probabilities with shape (batch_size,C)
    y0 shape is the same (batch_size), whose entries are integers from 0 to C-1
    """
    loss = 0.
    n_batch, n_class = x.shape
    # print(n_class)
    for x1,y1 in zip(x,y0):
        class_index = int(y1.item())
        loss = loss + torch.log(torch.exp(x1[class_index])/(torch.exp(x1).sum()))
    loss = - loss/n_batch
    return loss