Pytorch Adam optimizer's awkward behavior? better with restart?

Question

I'm trying to train a CNN text classifier with Pytorch. I'm using the Adam optimizer like this.

optimizer = torch.optim.Adam(CNN_Text.parameters(), lr=args.lr)

I figured out that the optimizer converges really fast, and then it keeps on slowly dropping on accuracy. (the validation loss decreases a lot in 1-2 minutes, then it keeps on increasing slowly)

So, I implemented learning-rate decay,

If curr_loss > val_loss: prev_lr = param_group['lr'] param_group['lr'] = prev_lr/10

I found out that it didn't really help a lot. But if I manually save the model, load it, and run the training with decreased learning rate, it really gets way better performance!

This gets me in hard time because I need to keep on watching the gradient descent and manually change the options. I tried SGD and other optimizers because I thought this was Adam's problem, but I couldn't find out a good way.

Can anyone help me with it?

Jatentaki Jatentaki · Accepted Answer · 2018-12-07T16:28:00

What is param_group? With that code snippet it looks like a variable not associated with the optimizer in any way. What you need to modify is the 'lr' entry of each element of optimizer.param_groups, which is what ADAM actually looks at.

Either way, unless you have a good reason to hand-roll it yourself, I suggest you use the LR scheduler provided with PyTorch. And if you do need to reimplement it, check out its code and take inspiration from there.

Pytorch Adam optimizer's awkward behavior? better with restart?

2 Answers