1
votes

I'm trying to train a CNN text classifier with Pytorch. I'm using the Adam optimizer like this.

optimizer = torch.optim.Adam(CNN_Text.parameters(), lr=args.lr)

I figured out that the optimizer converges really fast, and then it keeps on slowly dropping on accuracy. (the validation loss decreases a lot in 1-2 minutes, then it keeps on increasing slowly)

So, I implemented learning-rate decay,

If curr_loss > val_loss: prev_lr = param_group['lr'] param_group['lr'] = prev_lr/10

I found out that it didn't really help a lot. But if I manually save the model, load it, and run the training with decreased learning rate, it really gets way better performance!

This gets me in hard time because I need to keep on watching the gradient descent and manually change the options. I tried SGD and other optimizers because I thought this was Adam's problem, but I couldn't find out a good way.

Can anyone help me with it?

2

2 Answers

2
votes

What is param_group? With that code snippet it looks like a variable not associated with the optimizer in any way. What you need to modify is the 'lr' entry of each element of optimizer.param_groups, which is what ADAM actually looks at.

Either way, unless you have a good reason to hand-roll it yourself, I suggest you use the LR scheduler provided with PyTorch. And if you do need to reimplement it, check out its code and take inspiration from there.

0
votes

The problem is that Adam has additional internal parameters (cumulative averages of gradients, etc.) that need also to be reset.

For this reason, you have a better chance deleting the instantiating the optimiser with a lower learning rate.

At least that worked for me.