0
votes

When training my LSTM ( using the Keras library in Python ) the validation loss keeps increasing, although it eventually does obtain a higher validation accuracy. Which leads me to 2 questions:

  1. How/Why does it obtain a (significantly) higher validation accuracy at a (significantly) higher validation loss?
  2. Is it problematic that the validation loss increases? ( because it eventually does obtain a good validation accuracy either way )

This is an example history log of my LSTM for which this applies:

enter image description here

As visible when comparing epoch 0 with epoch ~430:

52% val accuracy at 1.1 val loss vs. 61% val accuracy at 1.8 val loss

For the loss function I'm using tf.keras.losses.CategoricalCrossentropy and I'm using the SGD optimizer at a high learning rate of 50-60% ( as it obtained the best validation accuracy with it ).

Initially I thought it may be overfitting, but then I don't understand how the validation accuracy does eventually get quite a lot higher at almost 2 times as high of a validation loss.

Any insights would be much appreciated.

EDIT: Another example of a different run, less fluctuating validation accuracy but still significantly higher validation accuracy as the validation loss increases:

enter image description here

In this run I used a low instead of high dropout.

1

1 Answers

2
votes

As you stated, "at a high learning rate of 50-60%", this might be the reason why graphs are oscillating. Lowering the learning rate or adding regularization should solve the oscillating problem.

More generally,

Cross Entropy loss is not a bounded loss, so having very badly outliers would make it explode.

  • Accuracy can go higher which means your model is able to learn the rest of the dataset except the outliers.
  • Validation set has too many outliers that causing the oscillation of the loss values.

To conclude if you are overfitting or not, you should inspect validation set for outliers.