I was reading this tutorial on Keras convolutional auto encoders, and I realized that I don't get the dimension (8, 4, 4) after these layers in my calculation - the dimension of images should drop to 3 already after the second convolutional layer, as the stride is large. So how does it obtain this dimension? Or can anyone explain the calculation process?
I am also confused on how "same" padding is executed in this situation, as they always mention "when stride=1 same padding will keep the same dimension". I totally get that. But what happens when stride isn't 1? how many zeros do I get on each side? I know the calculation equation for dimensions, floor((h + 2p - k)//s + 1), but what is p in this case?
Thanks
input_img = Input(shape=(1, 28, 28))
x = Convolution2D(16, 3, 3, activation='relu', border_mode='same')(input_img)
x = MaxPooling2D((2, 2), border_mode='same')(x)
x = Convolution2D(8, 3, 3, activation='relu', border_mode='same')(x)
x = MaxPooling2D((2, 2), border_mode='same')(x)
x = Convolution2D(8, 3, 3, activation='relu', border_mode='same')(x)
encoded = MaxPooling2D((2, 2), border_mode='same')(x)
# at this point the representation is (8, 4, 4) i.e. 128-dimensional