Output shape of a convolutional layer

Question

I built a convolutional neural network in Keras.

model.add(Convolution1D(nb_filter=111, filter_length=5, border_mode='valid', activation="relu", subsample_length=1))

According to the CS231 lecture a convolving operation creates a feature map (i.e. activation map) for each filter which are then stacked together. IN my case the convolutional layer has a 300 dimensional input. Hence, I expect the following computation:

Each filter has a window size of 5. Consequently, each filter produces 300-5+1=296 convolutions.
As there are 111 filters there should be a 111*296 output of the convolutional layer.

However, the actual output shapes look differently:

convolutional_layer = model.layers[1]
conv_weights, conv_biases = convolutional_layer.get_weights()

print(conv_weights.shape) # (5, 1, 300, 111)
print(conv_biases.shape)  # (,111)

The shape of the bias values makes sense, because there is one bias value for each filter. However, I do not understand the shape of the weights. Apparently, the first dimension depends on the filter size. The third dimension is the number of input neurons, which should have been reduced by the convolution. The last dimension probably refers to the number of filters. This does not make sense, because how should I easily get the feature map for a specific filter?

Keras either uses Theano or Tensorflow as a backend. According to their documentation the output of a convolving operation is a 4d tensor (batch_size, output_channel, output_rows, output_columns).

Can somebody explain me the output shape in accordance with the CS231 lecture?

Well... the actual output shape is not the weights shape. You can see the output shape when you create a model and make model.summary(). But, perhaps you've got inverted dimensions in the "input": (channels x 1d length) versus (1d length x channels). Try inverting the input, with "Reshape((1,300))" or "Reshape((300,1))" -- It will depend on whether your keras is configured for channels first or channels last. (Also, I don't know what the subsample_length=1 means, it's not on keras documentation, it seems). — Daniel Möller

Jai Jai · Accepted Answer · 2018-01-15T21:30:40

Your Weight dimension has to be [filter_height, filter_width, in_channel, out_channe]
With your example I think the input channel which is the depth of the input is 300 and you want the output channel to be 111
Total number of filters are 111 and not 300*111
As you have said by yourself each bias for every filter so 111 bias for 111 filters
Each filter out of 111 will produce a convolution on the input
The Weight shape in your case means that you are using a kernel patch of shape 5*1
The third dimension means that depth of input feature map is 300
The fourth dimension mean that depth of the output feature map is 111

Output shape of a convolutional layer

2 Answers