0
votes

I created a fully connected network in Pytorch with an input layer of shape (1,784) and a first hidden layer of shape (1,256). To be short: nn.Linear(in_features=784, out_features=256, bias=True)

Method 1 : model.fc1.weight.data.shape gives me torch.Size([128, 256]), while

Method 2 : list(model.parameters())[0].shape gives me torch.Size([256, 784])

In fact, between an input layer of size 784 and a hidden layer of size 256, I was expecting a matrix of shape (784,256). So, in the first case, I see the shape of the next hidden layer (128), which does not make sense for the weights between the input and first hidden layer, and, in the second case, it looks like Pytorch took the transform of the weight matrix.

I don't really understand how Pytorch shapes the different weight matrices, and how can I access individual weights after the training. Should I use method 1 or 2? When I display the corresponding tensors, the displays look totally similar, while the shapes are different.

1
can you add code for the model class ? - Umang Gupta

1 Answers

1
votes

In Pytorch, the weights of model parameters are transposed before applying the matmul operation on the input matrix. That's why the weight matrix dimensions are flipped, and is different from what you expect; i.e., instead of being [784, 256], you observe that it is [256, 784].

You can see the Pytorch source documentation for nn.Linear, where we have:

...

self.weight = Parameter(torch.Tensor(out_features, in_features))

...

def forward(self, input):
        return F.linear(input, self.weight, self.bias)

When looking at the implementation of F.linear, we see the corresponding line that multiplies the input matrix with the transpose of the weight matrix:

output = input.matmul(weight.t())