1
votes

Consider the convolutional neural network (two convolutional layers):

class ConvNet(nn.Module):
    def __init__(self, num_classes=10):
        super(ConvNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.fc = nn.Linear(7*7*32, num_classes)

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        return out

The fully connected layer fc is to have 7*7*32 inputs coming in. The above:

out = out.reshape(out.size(0), -1) leads to a tensor with size of (32, 49). This doesn't seem right as the dimensions of input for the dense layer is different. What am I missing here?

[Note that in Pytorch the input is in the following format: [N, C, W, H] so no. of channels comes before the width and height of image]

source: https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/02-intermediate/convolutional_neural_network/main.py#L35-L56

1

1 Answers

2
votes

If you look at the output of each layer you can easily understand what you are missing.

def forward(self, x):
    print ('input', x.size())
    out = self.layer1(x)
    print ('layer1-output', out.size())
    out = self.layer2(out)
    print ('layer2-output', out.size())
    out = out.reshape(out.size(0), -1)
    print ('reshape-output', out.size())
    out = self.fc(out)
    print ('Model-output', out.size())
    return out

test_input = torch.rand(4,1,28,28)
model(test_input)

OUTPUT:

('input', (4, 1, 28, 28))   
('layer1-output', (4, 16, 14, 14))  
('layer2-output', (4, 32, 7, 7))  
('reshape-output', (4, 1568))  
('Model-output', (4, 10))

Conv2d layer doesn't change the height and width of the tensor. only changes the channel of tensor because of stride and padding. MaxPool2d layer halves the height and width of the tensor.

inpt    = 4,1,28,28  
conv1_output = 4,16,28,28  
max_output   = 4,16,14,14  
conv2_output = 4,32,14,14  
max2_output  = 4,32,7,7  
reshapeutput = 4,1585 (32*7*7)  
fcn_output   = 4,10

N --> Input Size, F --> Filter Size, stride-> Stride Size, pdg-> Padding size

ConvTranspose2d;

OutputSize = N*stride + F - stride - pdg*2

Conv2d;

OutputSize = (N - F)/stride + 1 + pdg*2/stride [e.g. 32/3=10 it ignores after the comma]