1
votes

I built a ConvNet in keras and these are 2 of the layers

model.add(Conv2D(8 , 3 , input_shape = (28,28,1)))
model.add(Activation(act))

model.add(Conv2D(16 , 3))
model.add(Activation(act))

The output of the first layer of size 26x26x8 which I completely understand since there are 8 filters of size 3x3 and each of them is applied to produce a separate feature map hence 26x26x8

The output of the second layer is of size 24x24x16 which I do not understand. Shouldn't the output be of size 24x24x128 since each of the filters of the second layer will act on each feature map of the output of the first layer ?

Basically, I do not understand how the output of a layer is fed to the input of the other layer

2
No, the each filter is applied on all the channels (i.e. depth axis) of previous layer output . For example, in your example, each filter in the second Conv2D layer has a shape of (3, 3, 8).today
I think you are referring to the 1st layer since that layer has 8 filters. Am i right ?Tanmay Bhatnagar
No, I am referring to the each filter of the second layer. Each filter (i.e. kernel) in that layer has a shape of (3,3,8). Each filter in the first Conv2D layer has a shape of (3,3,1). In general each filter in a Conv2D layer has a shape of: (filter_height, filter_width, num_channels_in_output_of_previous_layer). Is it clear or do I need to explain more?today
The response of each filter (i.e. feature map) in a Conv2D layer always has one channel. That's why in the second layer you have 16 filters so you would have 16 feature maps as output.today

2 Answers

2
votes

No, it's a convolutions over volume. Each filter is applied for all channels.

0
votes

I would have loved it if someone had taken the time to actually write out the mathematics. But I'm guessing no one knew what the actual operations were. The ambiguous language "applied on all channels" was the same thing the OP thought was going on. A commenter above used this language to mean they were summed over all channels. Not clear.

I had the same question as the OP. I found the answer. The Conv2D layer's convolution operation in Keras creates a filter which has the same final dimension as the input.

Say you have an input, X, of shape (6, 6, 3), a tensor of size 6×6 in 3 channels (colors or whatever). Then creating a 2D convolution layer with

conv = Conv2D(2, 3, input_shape=(6, 6, 3))

will create 2 filters of size (3, 3, 3), f1 and f2. Then applying each filter the correct way to an input would look like f1ijk Xijk, where the i and j are summed over all relevant indexes for the location and k, the color channel, is summed over all values, i.e. 1, 2, and 3 here. This produces an output of size (4, 4, 1) for each filter. Together the two filters produce an output of size (4, 4, 2).

If we had assumed, as the OP seems to have, that each filter of 3-channel tensors was only of the shape (3, 3, 1) then you'd be confused as to how to handle its application to a 3-dimensional tensor, which might cause someone who cares about the actual operations to think that the filters would be applied as a tensor product, creating a significantly higher dimension of output from the layer.