0
votes

I was reading this tutorial on Keras convolutional auto encoders, and I realized that I don't get the dimension (8, 4, 4) after these layers in my calculation - the dimension of images should drop to 3 already after the second convolutional layer, as the stride is large. So how does it obtain this dimension? Or can anyone explain the calculation process?

I am also confused on how "same" padding is executed in this situation, as they always mention "when stride=1 same padding will keep the same dimension". I totally get that. But what happens when stride isn't 1? how many zeros do I get on each side? I know the calculation equation for dimensions, floor((h + 2p - k)//s + 1), but what is p in this case?

Thanks

input_img = Input(shape=(1, 28, 28))

x = Convolution2D(16, 3, 3, activation='relu', border_mode='same')(input_img)
x = MaxPooling2D((2, 2), border_mode='same')(x)
x = Convolution2D(8, 3, 3, activation='relu', border_mode='same')(x)
x = MaxPooling2D((2, 2), border_mode='same')(x)
x = Convolution2D(8, 3, 3, activation='relu', border_mode='same')(x)
encoded = MaxPooling2D((2, 2), border_mode='same')(x)

# at this point the representation is (8, 4, 4) i.e. 128-dimensional
1

1 Answers

0
votes

Oh no I think I know what happens: the code in the tutorial is wrong. I found this question which cite the same tutorial with the correct code. So they forgot to put parenthesis in all the Convolution2D layers (it's a translated version), and it actually should be 16, (3, 3), that means stride is 1, not 3. So it explains. If stride is 3 we can't get this dimension.