1
votes

Suppose we have an 5x5 size image and a 3x3 size kernel with Stride 2 and Padding On. What is the size of the output image after passing through a convolution layer in neural networks.

2

2 Answers

1
votes

The other answer is correct, but here is a drawing which visualizes why this formula holds: enter image description here

I: Image size, K: Kernel size, P: Padding, S: Stride

I will explain the formula for a single direction only (shifting the filter to the right), since its the same principle for the other direction.

Imagine, you place the kernel (the filter) in the upper left corner of the padded image.

Then there are I-K+2P pixels left over on the right hand side. If your stride is S, you will be able to place the kernel on this remaining part at floor( (I-K+2*P)/S ) positions. You can verify that you need "floor" for an image which has 4x4 pixels. You have to add one for the initial position of the kernel, to get the total number of kernel-positions.

Thus there are floor( (I-K+2*P)/S ) + 1 positions in total - which is the formula for your output size. Hope that helps.

0
votes

Let's consider a more general case:

Input is an image with size I*I. The input is padded with P*P pixels. The kernel has K*K size, and the strides are S*S. Then, the output has a O*O size which can be computed using a simple formula:

O = [(I+2*P-K)/S]+1; where [] shows the floor function.

So, you're answer is 3*3 since O=[(5+2*1-3)/2]+1=3.