I am trying to implement a simple example of how to apply cross-entropy to what is supposed to be the output of my semantic segmentation CNN.
Using the pytorch format I would have something like this:
out = np.array([[
[
[1.,1, 1],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]
],
[
[0, 0, 0],
[1, 1, 1],
[0, 0.,0],
[0, 0, 0]
],
[
[0, 0, 0],
[0, 0, 0],
[1, 1, 1],
[0, 0, 0]
],
[
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[1, 1, 1]
]
]])
out = torch.tensor(out)
So, my output here has dimensions (1, 4, 4, 3), being 1 element batch, 4 channels representing the 4 possible classes, and 4 by 3 data in each, storing the probability of that cell being from it's class.
Now my target is like this:
target=[
[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3]
]
Please notice how in the 'out' tensor each row has a 1.0 probability of being from that class resulting in a perfect match with the target.
For example, the third channel (channel 2) has its whole 3rd row (row 2) with 1.0 probabilities of being from that channel, and zero's in any other place; so it matches the 2's on the target in third row as well.
With this example I expect a minimal loss value between the two tensors.
My question are:
- What's the best way to use a cross-entropy loss method in PyTorch in order to reflect that this case has no difference between the target and its prediction?
- What loss value should I expect from this?
This is what I got so far:
import torch
from torch.nn import CrossEntropyLoss
import numpy as np
out = torch.Tensor(np.array([[
[
[1.,1, 1],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]
],
[
[0, 0, 0],
[1, 1, 1],
[0, 0.,0],
[0, 0, 0]
],
[
[0, 0, 0],
[0, 0, 0],
[1, 1, 1],
[0, 0, 0]
],
[
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[1, 1, 1]
]
]]))
target = torch.Tensor([[
[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3]
]]).type('torch.LongTensor')
criterion = CrossEntropyLoss()
print(criterion(out, target))
And outputs: tensor(0.7437)
- Shouldn't I expect a value closer to cero?
Thank you in advance