LSTM higher validation accuracy at higher validation loss ( Python: Keras )

Question

When training my LSTM ( using the Keras library in Python ) the validation loss keeps increasing, although it eventually does obtain a higher validation accuracy. Which leads me to 2 questions:

How/Why does it obtain a (significantly) higher validation accuracy at a (significantly) higher validation loss?
Is it problematic that the validation loss increases? ( because it eventually does obtain a good validation accuracy either way )

This is an example history log of my LSTM for which this applies:

As visible when comparing epoch 0 with epoch ~430:

52% val accuracy at 1.1 val loss vs. 61% val accuracy at 1.8 val loss

For the loss function I'm using tf.keras.losses.CategoricalCrossentropy and I'm using the SGD optimizer at a high learning rate of 50-60% ( as it obtained the best validation accuracy with it ).

Initially I thought it may be overfitting, but then I don't understand how the validation accuracy does eventually get quite a lot higher at almost 2 times as high of a validation loss.

Any insights would be much appreciated.

EDIT: Another example of a different run, less fluctuating validation accuracy but still significantly higher validation accuracy as the validation loss increases:

In this run I used a low instead of high dropout.

Frightera Frightera · Accepted Answer · 2021-01-24T16:10:54

As you stated, "at a high learning rate of 50-60%", this might be the reason why graphs are oscillating. Lowering the learning rate or adding regularization should solve the oscillating problem.

More generally,

Cross Entropy loss is not a bounded loss, so having very badly outliers would make it explode.

Accuracy can go higher which means your model is able to learn the rest of the dataset except the outliers.
Validation set has too many outliers that causing the oscillation of the loss values.

To conclude if you are overfitting or not, you should inspect validation set for outliers.

LSTM higher validation accuracy at higher validation loss ( Python: Keras )

1 Answers