Why does Keras loss drop dramatically after the first epoch?

Question

I'm training a U-Net CNN in Keras/Tensorflow and find that loss massively decreases between the last batch of the first epoch, and the first batch of the second epoch:

Epoch 00001: loss improved from inf to 0.07185 - categorical_accuracy: 0.8636
Epoch 2/400: 1/250 [.....................] - loss: 0.0040 - categorical_accuracy: 0.8878

Weirdly categorical accuracy does not drop with loss, but increases slightly. After the drop in loss, it doesn't decrease further, but settles around the lower value. I know this is very little information on the problem, but this behaviour might indicate a common problem I can investigate more?

Some extra info: Optimizer = Adam(lr=1e-4)(Lowering lr didn't seem to help)

Loss: 'class weighted categorical cross entropy', calculated as follows

def class_weighted_categorical_crossentropy(class_weights):
        
        def loss_function(y_true, y_pred):

        # scale preds so that the class probas of each sample sum to 1
        y_pred /= tf.reduce_sum(y_pred, -1, True)
        # manual computation of crossentropy
        epsilon = tf.convert_to_tensor(K.epsilon(), y_pred.dtype.base_dtype)
        y_pred = tf.clip_by_value(y_pred, epsilon, 1. - epsilon)

        # Multiply each class by its weight:
        classes_list = tf.unstack(y_true * tf.math.log(y_pred), axis=-1)
        for i in range(len(classes_list)):
            classes_list[i] = tf.scalar_mul(class_weights[i], classes_list[i])

        # Return weighted sum:
        return - tf.reduce_sum(tf.stack(classes_list, axis=-1), -1)

    return loss_function

Any ideas/sanity checks are much appreciated!

EDIT:This is the loss plot for training, I didn't have time to neaten it up, its loss plotted per step, not epoch, and you can see the shift to epoch 2 after 250 steps, up until that point the loss curve seems very good, but the shift two epoch two seems strange.

There is nothing wrong, the cross entropy loss considers the confidence of a class (probability), and if its predicting the correct class. Accuracy only considers the correct class being predicted, without considering confidence scores, so its normal that loss can change without accuracy changing. — Dr. Snoopy

Eric McLachlan Eric McLachlan · Accepted Answer · 2020-07-15T08:25:16

That sounds right to me. Remember, there is an inverse relationship between loss and accuracy, so as loss decreases, accuracy increases.

My understanding is that, during the first epoch, you basically have a neural network with more-or-less random initial state. After the first epoch, the weights of the neural network will be adjusted often by minimize the loss function (which as previously states is effectively the same as maximizing accuracy). So, at the beginning of the second epoch, your loss should be a lot better (i.e. lower). That means that your neural network is learning.

Why does Keras loss drop dramatically after the first epoch?

1 Answers