I'm trying to train a CNN text classifier with Pytorch. I'm using the Adam optimizer like this.
optimizer = torch.optim.Adam(CNN_Text.parameters(), lr=args.lr)
I figured out that the optimizer converges really fast, and then it keeps on slowly dropping on accuracy. (the validation loss decreases a lot in 1-2 minutes, then it keeps on increasing slowly)
So, I implemented learning-rate decay,
If curr_loss > val_loss:
prev_lr = param_group['lr']
param_group['lr'] = prev_lr/10
I found out that it didn't really help a lot. But if I manually save the model, load it, and run the training with decreased learning rate, it really gets way better performance!
This gets me in hard time because I need to keep on watching the gradient descent and manually change the options. I tried SGD and other optimizers because I thought this was Adam's problem, but I couldn't find out a good way.
Can anyone help me with it?