Model weights are not being updatesd, but loss is decreasing

Question

The following code is to train an MLP with images of size 64*64, while using the loss ||output - input||^2.

For some reason, my weights per epoch are not being updated as shown at the end.

class MLP(nn.Module):
    def __init__(self, size_list):
        super(MLP, self).__init__()
        layers = []
        self.size_list = size_list
        for i in range(len(size_list) - 2):
            layers.append(nn.Linear(size_list[i],size_list[i+1]))
            layers.append(nn.ReLU())
        layers.append(nn.Linear(size_list[-2], size_list[-1]))
        self.net = nn.Sequential(*layers)

    def forward(self, x):
        return self.net(x)

model_1 = MLP([4096, 64, 4096])

And for training each epoch:

def train_epoch(model, train_loader, criterion, optimizer):
    model.train()
    model.to(device)

running_loss = 0.0

    start_time = time.time()
    # train batch
    for batch_idx, (data) in enumerate(train_loader):   
        optimizer.zero_grad() 

        data = data.to(device)

        outputs = model(data)
        loss = criterion(outputs, data)
        running_loss += loss.item()

        loss.backward()
        optimizer.step()

    end_time = time.time()

    weight_ll = model.net[0].weight
    running_loss /= len(train_loader)

    print('Training Loss: ', running_loss, 'Time: ',end_time - start_time, 's')
    return running_loss, outputs, weight_ll

for training the data:

n_epochs = 20
Train_loss = []
weights=[]

criterion = nn.MSELoss()

optimizer = optim.SGD(model_1.parameters(), lr = 0.1)


for i in range(n_epochs):
    train_loss, output, weights_ll = train_epoch(model_1, trainloader, criterion, optimizer)
    Train_loss.append(train_loss)
    weights.append(weights_ll)
    print('='*20)

Now, when I print the weights of the first fully connected layer per epoch they aren't being updated.

print(weights[0][0])
print(weights[19][0])

The output for the above is (showing the weight in epoch 0 and in epoch 19):

tensor([ 0.0086,  0.0069, -0.0048,  ..., -0.0082, -0.0115, -0.0133],
       grad_fn=<SelectBackward>)
tensor([ 0.0086,  0.0069, -0.0048,  ..., -0.0082, -0.0115, -0.0133],
       grad_fn=<SelectBackward>)

What may be going wrong? Looking at my loss, it's decreasing at a steady rate but there is no change in the weights.

Did you check the computed gradients of your network just after loss.backword()? — zihaozhihao
@zihaozhihao after printing out grad = model.net[0].weight.grad for each epoch, it seems like even they are not being updated! — dankpenny
You are updating the weights. But the way you print out the weights is sorted of incorrect. Please check my answer. — zihaozhihao

zihaozhihao zihaozhihao · Accepted Answer · 2019-09-20T00:46:39

Try to change it weight_ll = model.net[0].weight.clone().detach() or just weight_ll = model.net[0].weight.clone() in your train_epoch() function. And you will see the weights differ.

Explanation: weights_ll are always the last epoch values if you do not clone it. It will be regarded as the same tensor in the graph. That's why your weights[0][0] equals to weights[19][0], they are actually the same tensor.

Model weights are not being updatesd, but loss is decreasing

1 Answers