
I'm trying to train a CNN text classifier with Pytorch. I'm using the Adam optimizer like this.

optimizer = torch.optim.Adam(CNN_Text.parameters(), lr=args.lr)

I figured out that the optimizer converges really fast, and then it keeps on slowly dropping on accuracy. (the validation loss decreases a lot in 1-2 minutes, then it keeps on increasing slowly)

So, I implemented learning-rate decay,

If curr_loss > val_loss: prev_lr = param_group['lr'] param_group['lr'] = prev_lr/10

I found out that it didn't really help a lot. But if I manually save the model, load it, and run the training with decreased learning rate, it really gets way better performance!

This gets me in hard time because I need to keep on watching the gradient descent and manually change the options. I tried SGD and other optimizers because I thought this was Adam's problem, but I couldn't find out a good way.

Can anyone help me with it?


2 Answers


What is param_group? With that code snippet it looks like a variable not associated with the optimizer in any way. What you need to modify is the 'lr' entry of each element of optimizer.param_groups, which is what ADAM actually looks at.

Either way, unless you have a good reason to hand-roll it yourself, I suggest you use the LR scheduler provided with PyTorch. And if you do need to reimplement it, check out its code and take inspiration from there.


The problem is that Adam has additional internal parameters (cumulative averages of gradients, etc.) that need also to be reset.

For this reason, you have a better chance deleting the instantiating the optimiser with a lower learning rate.

At least that worked for me.