Resetting Adam optimizer reduces validation loss

Question

I am training a recurrent neural network on LibriSpeech. I have tried different variations regarding learning rate, batch size, etc. In every training, one thing was similar, that validation loss gets saturated after 7 epochs. I thought that this might be due to overfitting. But, I noticed a weird behaviour, that after resetting the Adam optimizer, i.e., it's slot variables m and v, after 7 epochs of training, the validation loss decreases to a new lower minima than the previous one and then starts oscillating around that value for the rest of the training. I am speculating that due to longer periods of training, slot v variables become much smaller compared to slot m variables. So after resetting them, this unknown behaviour happens. I am not sure though. So, do we need to reset Adam optimizer after every fixed number of steps? Or if not, then why the validation loss decreases to a new lower minima? I am using the default values of beta_1, beta_2 and epsilon for Adam optimizer in Tensorflow

This is probably not well suited as a question for SO as this probably requires empirical research to validate that this is consistent behavior and not just something that you anecdotally observed. I can at least tell you that resetting the window is definitely not standard practice (at least from my knowledge). In your case, resetting the window might lead to a short-term increase in the effective learning rate due to the reset momentum which would explain your observations, including the oscillations at the end. — runDOSrun
Hi, yes you are right, it is not an empirical behaviour. I am getting this issue with my current neural network. Previously, I have never encountered such an anomaly with Adam. Yes, the effective learning rate should increase due to resetting the momentum. Thanks for the intuition here. This explains why validation loss decreases suddenly. Also, does this mean that training my model with a more careful learning rate should get me the reduced validation loss without resetting the optimizer? — Tushar Vatsal

Gerry P Gerry P · Accepted Answer · 2020-09-23T03:40:08

Not sure what is creating the behavior but I believe you might avoid it by using an adjustable learning rate. The keras callback ReduceLROnPlateau makes that easy to do. Documentation is here. Set it up to monitor validation loss and it will automatically lower the learning rate by a specified factor if the validation loss fails to decrease over a specified number(patience) of consecutive epochs. I use a factor of .6 and a patience value of 1. Give it a try and hopefully your validation loss will achieve a lower level without resetting the optimizer.

Resetting Adam optimizer reduces validation loss

2 Answers