0
votes

This loss function in tensorflow is used as a loss function in keras/tensorflow to weight binary decisions

It weights false positives vs false negatives:

targets * -log(sigmoid(logits)) + (1 - targets) * -log(1 - sigmoid(logits))

The argument pos_weight is used as a multiplier for the positive targets:

targets * -log(sigmoid(logits)) * pos_weight + (1 - targets) * -log(1 - sigmoid(logits))

Does anybody have any suggestions how in addition true positives could be weighted against true negatives if the loss/reward of them should not have an equal weight?

1
I am stating the obvious, but true positives/negatives are generally not included in the loss. The network classified them correctly, and we generally don't penalize it for doing so. What do you want to achieve by "weighting" true positives vs negatives?iga
My problem is that a false negative has opportunity costs that need to be modeled into the cross entropy loss function somehow. The opportunity costs are equal to the payoff (weight) I’m giving to true positives, which is +10. False positives have a different payoff of -1. But the decisions are binary and not probabilistic, only y_pred is probabilistic as it originates from an output neuron with sigmoid activation function. In essence, tp are more important (+10) than tn (zero payoff) and need to be weighted accordingly to make it more 'aggressive' towards predicting true.Nickpick
If I understand correctly and your goal is to make the model more likely to predict true, make the relative cost of false positive very small. The network will see that it is ok to say "yes" and be wrong but not ok to say "no" and be wrong. In the extreme case, if you make the cost of false positives zero, I bet the network will learn to always output "yes"... What I am saying at the high-level is that I don't see in your description an objective (as defined by desired network behavior) that you cannot achieve by weighting false positives vs false negatives.iga
Yes you understand my objective correctly. Are you saying I'll simply need to adjust the weights of the false positives and false negatives to include the additional 'draw' I want from true positive and true negatives? If so, how exactly would this have to be done?Nickpick
Unless you are in some RL setting (or at least doing something like policy gradient/REINFORCE), I don't know a way to train a net with back propagation using rewards. You can only penalize mistakes, not assign rewards. That being said, I think you can achieve the desired behavior with assigning different weight to different mistakes, even if you don't explicitly "reward" the behavior. If this does not make much sense, please precisely define the desired behavior and explain why you think this behavior can't be achieved with the suggested mechanism. I am not sure what more to say at this point.iga

1 Answers

2
votes

First, note that with cross entropy loss, there is some (possibly very very small) penalty to each example (even if correctly classified). For example, if the correct class is 1 and our logit is 10, the penalty will be

-log(sigmoid(10)) = 4*1e-5

This loss (very slightly) pushes the network to produce even higher logit for this case to get its sigmoid even closer to 1. Similarly, for negative class, even if the logit is -10, the loss will push it to be even more negative.

This is usually fine because the loss from such terms is very small. If you would like your network to actually achieve zero loss, you can use label_smoothing. This is probably as close to "rewarding" the network as you can get in the classic setup of minimizing loss (you can obviously "reward" the network by adding some negative number to the loss. That won't change the gradient and training behavior though).

Having said that, you can penalize the network differently for various cases - tp, tn, fp, fn - similarly to what is described in Weight samples if incorrect guessed in binary cross entropy. (It seems like the implementation there is actually incorrect. You want to use corresponding elements of the weight_tensor to weight individual log(sigmoid(...)) terms, not the final output of cross_entropy).

Using this scheme, you might want to penalize very wrong answers much more than almost right answers. However, note that this is already happening to a degree because of the shape of log(sigmoid(...)).