I would have loved it if someone had taken the time to actually write out the mathematics. But I'm guessing no one knew what the actual operations were. The ambiguous language "applied on all channels" was the same thing the OP thought was going on. A commenter above used this language to mean they were summed over all channels. Not clear.
I had the same question as the OP. I found the answer. The Conv2D layer's convolution operation in Keras creates a filter which has the same final dimension as the input.
Say you have an input, X, of shape (6, 6, 3), a tensor of size 6×6 in 3 channels (colors or whatever). Then creating a 2D convolution layer with
conv = Conv2D(2, 3, input_shape=(6, 6, 3))
will create 2 filters of size (3, 3, 3), f1 and f2. Then applying each filter the correct way to an input would look like f1ijk Xijk, where the i and j are summed over all relevant indexes for the location and k, the color channel, is summed over all values, i.e. 1, 2, and 3 here. This produces an output of size (4, 4, 1) for each filter. Together the two filters produce an output of size (4, 4, 2).
If we had assumed, as the OP seems to have, that each filter of 3-channel tensors was only of the shape (3, 3, 1) then you'd be confused as to how to handle its application to a 3-dimensional tensor, which might cause someone who cares about the actual operations to think that the filters would be applied as a tensor product, creating a significantly higher dimension of output from the layer.
Conv2D
layer has a shape of(3, 3, 8)
. – today(3,3,8)
. Each filter in the first Conv2D layer has a shape of(3,3,1)
. In general each filter in a Conv2D layer has a shape of:(filter_height, filter_width, num_channels_in_output_of_previous_layer)
. Is it clear or do I need to explain more? – today