I am trying to classify some CXR images that have multiple labels per sample. From what I understand I have to put a dense layer with sigmoid activations and use the binary crossentropy as my loss function. The issue is that there is a large class imbalance (Many more normals than abnormals). I am curious here is my model sofar:
from keras_applications.resnet_v2 import ResNet50V2
from keras.layers import GlobalAveragePooling2D, Dense
from keras import Sequential
ResNet = Sequential()
ResNet.add(ResNet50V2(input_shape=shape, include_top=False, weights=None,backend=keras.backend,
layers=keras.layers,
models=keras.models,
utils=keras.utils))
ResNet.add(GlobalAveragePooling2D(name='avg_pool'))
ResNet.add(Dense(len(label_counts), activation='sigmoid', name='Final_output'))
As we can see I am using sigmoid to get an output, but I am a bit confused as to how to implement the weights. I think I need to use a custom loss function that uses BCE(use_logits = true). Something like this:
xent = tf.losses.BinaryCrossEntropy(
from_logits=True,
reduction=tf.keras.losses.Reduction.NONE)
loss = tf.reduce_mean(xent(targets, pred) * weights))
So it treats the outputs as logits, but what I am unsure about is the activation of the final output. Do I keep it with the activation of sigmoid, or do I use a linear activation (not activated)? I assume we keep the sigmoid, and just treat it as a logit, but I am unsure as pytorches "torch.nn.BCEWithLogitsLoss
" contains a sigmoid layer
EDIT: Found this: https://www.reddit.com/r/tensorflow/comments/dflsgv/binary_cross_entropy_with_from_logits_true/
As per: pgaleone
from_logits=True means that the loss function expects a linear tensor (the output layer of your network without any activation function but the identity), so you have to remove the sigmoid, since it will be the loss function itself to apply the softmax to your network output, and then to compute the cross-entropy