CNN architecture - how to interpret kernel size notation

Question

Using Keras I am trying to rebuild a basic CNN architecture I found in a paper. The paper describes the architecture as follows:

normalized input 36x36
1st convolutional feature map 32x32 (3@1x5x5 kernel)
2nd convolutional feature map 28x28 (3@3x5x5 kernel)
1st max pooling output 14x14
3rd convolutional feature map 10x10 (3@3x5x5 kernel)
2nd max pooling output 5x5
Flatten
Fully connected layer 75 nodes
Fully connected layer 10 nodes
Fully connected layer 2 nodes
Output

Activation functions are said to be of type relu.

I came up with the following code in keras to replicate the architecture:

model = Sequential()
model.add(Dense(36, input_shape=(36,36,1)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Conv2D(32, (5, 5)))
model.add(Activation('relu'))
model.add(Conv2D(32, (5, 5)))
model.add(Activation('relu'))
model.add(Conv2D(32, (5, 5)))
model.add(Activation('relu'))

model.add(Conv2D(28, (5, 5)))
model.add(Activation('relu'))
model.add(Conv2D(28, (5, 5)))
model.add(Activation('relu'))
model.add(Conv2D(28, (5, 5)))
model.add(Activation('relu'))


model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(10, (5, 5),padding="same"))
model.add(Activation('relu'))
model.add(Conv2D(10, (5, 5),padding="same"))
model.add(Activation('relu'))
model.add(Conv2D(10, (5, 5),padding="same"))
model.add(Activation('relu'))

model.add(MaxPooling2D(pool_size=(2, 2),padding="same"))


model.add(Flatten())  # this converts our 3D feature maps to 1D feature vectors
model.add(Dense(75))
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation('relu'))
model.add(Dense(2))
model.add(Activation('softmax'))


model.compile(Adam(lr=.01), loss='categorical_crossentropy', metrics=['acc'])

However, I am not sure how to understand the kernel notation given for the convolutional feature maps. To be more specific, I do not understand why there are 3 dimensions (i.e. 3@1x5x5) given whereas I can only define a tuple as the "kernel_size" in my Keras model.

"I do not understand why there are 3 dimensions" - one for each colour channel — Mitch Wheat

asakryukin asakryukin · Accepted Answer · 2018-02-28T02:25:05

It would be easier if you attached the paper, but from what we have, it should be as following:

3@1x5x5 means kernel size is 5-by-5, 1 is the number of channels of the input, 3 is the number of channels of the output.

I have not used Keras, but it should look like:

model = Sequential() 
model.add(BatchNormalization(input_shape=(36,36)))  
model.add(Conv2D(3, (5, 5), activation='relu'))
model.add(Conv2D(3, (5, 5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(3, (5, 5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))   

model.add(Flatten())  # this converts our 3D feature maps to 1D feature vectors 
model.add(Dense(75, activation= 'relu')) 
model.add(Dense(10, activation= 'relu')) 
model.add(Dense(2, activation= 'softmax'))

model.compile(Adam(lr=.01), loss='categorical_crossentropy', metrics=['accuracy'])

CNN architecture - how to interpret kernel size notation

1 Answers