According to this paper, the output shape is N + H - 1
, N
is input height or width, H
is kernel height or width. This is obvious inverse process of convolution. This tutorial gives a formula to calculate the output shape of convolution which is (W−F+2P)/S+1
, W
- input size, F
- filter size, P
- padding size, S
- stride. But in Tensorflow, there are test cases like:
strides = [1, 2, 2, 1]
# Input, output: [batch, height, width, depth]
x_shape = [2, 6, 4, 3]
y_shape = [2, 12, 8, 2]
# Filter: [kernel_height, kernel_width, output_depth, input_depth]
f_shape = [3, 3, 2, 3]
So we use y_shape
, f_shape
and x_shape
, according to formula (W−F+2P)/S+1
to calculate padding size P
. From (12 - 3 + 2P) / 2 + 1 = 6
, we get P = 0.5
, which is not an integer. How does deconvolution works in Tensorflow?