why Gradient Descent doesn't work as expected with pytorch

Question

so I'm starting with Pytorch and tried to start with an easy Linear Regression Example. Actually I made an easy Implementation of Linear Regression with Pytorch to calculate the equation 2*x+1 but the loss stay stuck at 120 and there is a Problem with Gradient Descent because it doesn't converge to a small loss value. I don't know why this is happening and it made me crazy because I don't see what's wrong. actually this example should be very easy to solve. this is the Code I'm using

import torch
import torch.nn.functional as F
from torch.utils.data import TensorDataset, DataLoader
import numpy as np

X = np.array([i for i in np.arange(1, 20)]).reshape(-1, 1)
X = torch.tensor(X, dtype=torch.float32, requires_grad=True)
y = np.array([2*i+1 for i in np.arange(1, 20)]).reshape(-1, 1)
y = torch.tensor(y, dtype=torch.float32, requires_grad=True)
print(X.shape, y.shape)

class LR(torch.nn.Module):
    def __init__(self, n_features, n_hidden1, n_out):
        super(LR, self).__init__()
        self.linear = torch.nn.Linear(n_features, n_hidden1)

        self.predict = torch.nn.Linear(n_hidden1, n_out)

    def forward(self, x):
        x = F.relu(self.linear(x))
        x = self.predict(x)
        return x

model = LR(1, 10, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
loss_fn = torch.nn.MSELoss()

def train(epochs=100):
    for e in range(epochs):

        pred = model(X)
        loss = loss_fn(pred, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        print(f"epoch: {e} and loss= {loss}")

desired output is a small loss value and that the model train to give a good prediction later.

ostrichgroomer ostrichgroomer · Accepted Answer · 2019-09-05T02:31:38

Your learning rate is too large. The model takes a few steps in the right direction, but it can't land on an actually good minimizer and henceforth zigzags around it. If you try lr=0.001 instead, your performance will be much better. This is why it's often useful to decay your learning rate over time when using first order optimizers.

why Gradient Descent doesn't work as expected with pytorch

1 Answers