Understanding the output shape of conv2d layer in keras

Question

I do not understand why the channel dimension is not included in the output dimension of a conv2D layer in Keras.

I have the following model

def create_model():
    image = Input(shape=(128,128,3))

    x = Conv2D(24, kernel_size=(8,8), strides=(2,2), activation='relu', name='conv_1')(image)
    x = Conv2D(24, kernel_size=(8,8), strides=(2,2), activation='relu', name='conv_2')(x)
    x = Conv2D(24, kernel_size=(8,8), strides=(2,2), activation='relu', name='conv_3')(x)
    flatten = Flatten(name='flatten')(x)

    output = Dense(1, activation='relu', name='output')(flatten)
    model = Model(input=image, output=output)
    return model

model = create_model()
model.summary()

The model summary is given the figure at the end of my question. The input layer takes RGB images with width = 128 and height = 128. The first conv2D layer tells me the output dimension is (None, 61, 61, 24). I have used the kernel size of (8, 8), a stride of (2, 2) no padding. The values 61 = floor( (128 - 8 + 2 * 0)/2 + 1) and 24 (number of kernels/filters) makes sense. But why isn't the dimension for the different channels included in the dimension? As far as I can see the parameters for the 24 filters on each of the channels is included in the number of the parameters. So I would expect the output dimension to be (None, 61, 61, 24, 3) or (None, 61, 61, 24 * 3). Is this just a strange notation in Keras or am I confused about something else?

You are confused about something else, a Conv2D layer outputs n feature maps, which is the number of kernels or filters, and the channels dimension is always equal to the number of output feature maps. — Dr. Snoopy
@MatiasValdenegro: But I thought that the convolution is scanning over each channel of the image so I would assume that there is an additional dimension. The convolution should give (61, 61, 24) as output but for each of the layers. Is this implicitly implied? — MachineLearner
@MatiasValdenegro: Does the convolution scan all three channels at the same time? — MachineLearner
@MachineLearner Yes, therefore I wrote explicity (in_channels, k, k) — Vlad

Bambam Bambam · Accepted Answer · 2019-07-30T09:24:44

This question is asked in various forms all over the internet and has a simple answer which is often missed or confused:

SIMPLE ANSWER: The Keras Conv2D layer, given a multi-channel input (e.g. a color image), will apply the filter across ALL the color channels and sum the results, producing the equivalent of a monochrome convolved output image.

An example, from the keras.io website cifar CNN example:

(1) You're training with the CIFAR image dataset, which is made up of 32x32 color images, i.e. each image is shape (32,32,3) (RGB = 3 channels)

(2) Your first layer of your network is a Conv2D Layer with 32 filters, each specified as 3x3, so:

Conv2D(32, (3,3), padding='same', input_shape=(32,32,3))

(3) Counter-intuitively, Keras will configure each filter as (3,3,3), i.e. a 3D volume covering the 3x3 pixels PLUS all the color channels. As a minor detail each filter has an additional weight for a BIAS value, as per normal neural network layer arithmetic.

(4) Convolution proceeds absolutely as normal, except a 3x3x3 VOLUME from the input image is convolved at each step with the 3x3x3 filter, and a single (monochrome) output value (i.e. like a pixel) is produced at each step.

(5) The result is a Keras Conv2D convolution of a specified (3,3) filter on a (32,32,3) image produces a (32,32) result because the actual filter used is (3,3,3).

(6) In this example, we have also specified 32 filters in the Conv2D layer, so the actual output is (32,32,32) for each input image (i.e. you might think of this as 32 images, one for each filter, each 32x32 monochrome pixels).

As a check, you can look at the count of weights (Param #) for the layer produced by model.summary():

Layer (type)         Output shape       Param#
conv2d_1 (Conv2D)   (None, 32, 32, 32)  896

There are 32 filters, each 3x3x3 (i.e. 27 weights) plus 1 for the bias (i.e. total 28 weights each). And 32 filters x 28 weights each = 896 Parameters.

Understanding the output shape of conv2d layer in keras

4 Answers