2
votes

I'm trying to improve the cnn architecture below. I'm using the cnn for image classification. Can anyone suggest any changes to the architecture below that would reduce training time without loosing too much accuracy?

Notes on Architecture:

It has a Convolutional layer of 16 filters utilizing a 3,3 window to handle the initial input of the Neural Network.

It's followed with a max pooling layer of 2,2

Next, is another convolutional layer that is the same size as the first layer so as to preserve the data that was passed through from the prior layer.

Following the first 2 layers is the third convolutional layer to include 32 filters as this allows for the network to start looking at more detail and open up room for more data.

The 3rd layer yielded a Global Average pooling layer that would then be fed into the fully connected layer.

The first fully connected hidden layer makes use of 64 units as this was a estimate by me to allow for a butter before the output layer to give the network more space to determine the weights.

followed by a Dropout layer to help prevent overfitting before finally being passed to the output layer that makes a prediction.

The output layer has a softmax activation function which allows it maintain the probability distribution between numbers 0,1.

CNN Code:

from keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D
from keras.layers import Dropout, Flatten, Dense
from keras.models import Sequential

model = Sequential()
model.add(Conv2D(16, (3,3), input_shape=(224,224,3), activation = 'relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(16, (3,3), activation= 'relu'))
model.add(MaxPooling2D(pool_size =(2,2)))
model.add(Conv2D(32, (3,3), activation= 'relu'))
model.add(GlobalAveragePooling2D())
model.add(Dense(units=64, activation= 'relu'))
model.add(Dropout(0.3))
model.add(Dense(units= 133, activation = 'softmax'))
model.summary()
1
Your images are relatively large (224x224x3), try smaller sizes and see how much it influences your performance. Depending on the images, you might also try converting them to grayscale beforehand.aseipel

1 Answers

3
votes

Most of training computation is happening in the first Conv2D layer:

Conv2D(16, (3,3), input_shape=(224,224,3), activation = 'relu')

There are (224 - 2)*(224 - 2) = 49284 spatial patches of size 3x3 and 16 filters in this layer, which in total gives almost 800k (788544 to be exact) convolution operations for forward and backward pass. And this doesn't take into account your batch size.

What I suggest you to do is to use striding in the first layer, for example strides=(2, 2) will reduce the number of patches 4 times. In addition, the network performs downsampling with striding. This means that you can get rid of the next MaxPooling2D layer and get the same feature map size with just a convolutional layer.

Of course, the network is going to lose some flexibility, but it shouldn't affect the accuracy that much.