Which loss function will converge well in multi-label image classification task?

Question

I've trained a multi-label multi-class image classifier by using sigmoid as output activation function and binary_crossentropy as loss function. The accuracy curve for validation is showing up-down fluctuation while loss curve at few epochs is showing weird(very high) values.

Following is the Accuracy and loss-curve for fine-tuned(last block) VGG19 model with Dropout and BatchNormalization.

Accuracy curve
loss curve

Accuracy and loss-curve for fine-tuned(last block) VGG19 model with Dropout, BatchNormalization and Data Augmentation.

accuracy curve with data augmentation
loss curve with data augmentation

I've trained the classifier with 1800 training images(5-labels) with 100 validation images. The optimizer I'd used is SGD((lr=0.001, momentum=0.99). Can anyone explain why loss-curve is getting so much weird or high values at some eochs? Should I use different loss-function? If yes, which one?

Frederik Bode Frederik Bode · Accepted Answer · 2020-02-10T13:56:09

Don't worry - all is well. Your loss curve doesn't say much, especially 'spikes in the loss curve'. They're totally allowed, your model is still training. You should look at your accuracy curve, and that one goes up pretty normal I think.

Which loss function will converge well in multi-label image classification task?

1 Answers