Keras - Multilabel classification with weights

Question

I am trying to classify some CXR images that have multiple labels per sample. From what I understand I have to put a dense layer with sigmoid activations and use the binary crossentropy as my loss function. The issue is that there is a large class imbalance (Many more normals than abnormals). I am curious here is my model sofar:

from keras_applications.resnet_v2 import ResNet50V2
from keras.layers import GlobalAveragePooling2D, Dense
from keras import Sequential
ResNet = Sequential()
ResNet.add(ResNet50V2(input_shape=shape, include_top=False, weights=None,backend=keras.backend,
    layers=keras.layers,
    models=keras.models,
    utils=keras.utils))
ResNet.add(GlobalAveragePooling2D(name='avg_pool'))

ResNet.add(Dense(len(label_counts), activation='sigmoid', name='Final_output'))

As we can see I am using sigmoid to get an output, but I am a bit confused as to how to implement the weights. I think I need to use a custom loss function that uses BCE(use_logits = true). Something like this:

xent = tf.losses.BinaryCrossEntropy(
    from_logits=True,
    reduction=tf.keras.losses.Reduction.NONE)
loss = tf.reduce_mean(xent(targets, pred) * weights))

So it treats the outputs as logits, but what I am unsure about is the activation of the final output. Do I keep it with the activation of sigmoid, or do I use a linear activation (not activated)? I assume we keep the sigmoid, and just treat it as a logit, but I am unsure as pytorches "torch.nn.BCEWithLogitsLoss" contains a sigmoid layer

EDIT: Found this: https://www.reddit.com/r/tensorflow/comments/dflsgv/binary_cross_entropy_with_from_logits_true/

As per: pgaleone

from_logits=True means that the loss function expects a linear tensor (the output layer of your network without any activation function but the identity), so you have to remove the sigmoid, since it will be the loss function itself to apply the softmax to your network output, and then to compute the cross-entropy

Rick Rick · Accepted Answer · 2020-01-07T17:04:45

You actually would not want to use from_logits in multilabel classification.

From the documentation [1]:

logits: Per-label activations, typically a linear output. These activation energies are interpreted as unnormalized log probabilities.

So you are right saying that you don't want to use an activation function when it is set to True.

However, the documentation also says

WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results

Softmax optimizes for one class, per definition. That's how softmax is designed to work. Since you are doing multilabel classification you should use sigmoid, as you mentioned yourself.

This means that if you want to use sigmoid, you cannot use from_logits because it would apply softmax after sigmoid which is generally not what you want.

The solution is to remove this line:

from_logits=True,

[1] https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits?version=stable

Keras - Multilabel classification with weights

1 Answers