0
votes

I am using Python 3.8 and PyTorch 1.7.1. I saw a code which defines a Conv2d layer as follows:

Conv2d(3, 6, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)

The input 'X' being passed to it is a 4D tensor-

X.shape
# torch.Size([4, 3, 6, 6])

The output volume for this conv layer is:

c1(X).shape
# torch.Size([4, 6, 3, 3])

I am trying to use the formula to compute output spatial dimensions for any conv layer: O = ((W - K + 2P)/S) + 1, where W = spatial dimension of image, K = filter/kernel size, P = zero padding & S = stride.

For 'c1' conv layer, we get, W = 6, K = 3, S = 2 & P = 1. Using the formula, you get O = ((6 - 3 + (2 x 1)) / 2) + 1 = 5/2 + 1 = 3.5.

The output volume: (4, 6, 3, 3) since number of filters used = 6. How is the spatial output from 'c1' then (3, 3)? What am I not getting?

Thanks!

1

1 Answers

1
votes

How would you have half a pixel?

You're missing the floor function:

O = floor(((W - K + 2P)/S) + 1)

So the shape of the outputted maps is (3, 3).


Here's the complete formula (with dilation) for nn.Conv2d:

enter image description here