I created a fully connected network in Pytorch with an input layer of shape (1,784) and a first hidden layer of shape (1,256).
To be short: nn.Linear(in_features=784, out_features=256, bias=True)
Method 1 : model.fc1.weight.data.shape gives me torch.Size([128, 256]), while
Method 2 : list(model.parameters())[0].shape gives me torch.Size([256, 784])
In fact, between an input layer of size 784 and a hidden layer of size 256, I was expecting a matrix of shape (784,256).
So, in the first case, I see the shape of the next hidden layer (128), which does not make sense for the weights between the input and first hidden layer, and, in the second case, it looks like Pytorch took the transform of the weight matrix.
I don't really understand how Pytorch shapes the different weight matrices, and how can I access individual weights after the training. Should I use method 1 or 2? When I display the corresponding tensors, the displays look totally similar, while the shapes are different.