I have been using Pytorch for a while now. One question I had regarding backprop is as follows:
let's say we have a loss function for a neural network. For doing backprop, I have seen two different versions. One like:
optimizer.zero_grad()
autograd.backward(loss)
optimizer.step()
and the other one like:
optimizer.zero_grad()
loss.backward()
optimizer.step()
Which one should I use? Is there any difference between these two versions?
As a last question, do we need to specify the requires_grad=True
for the parameters of every layer of our network to make sure their gradients is being computed in the backprop?
For example do I need to specify it for the layer nn.Linear(hidden_size, output_size)
inside my network or it is automatically being set to True by default?