Why is my CNN overfitting and how can I fix it?

Question

I am finetuning a 3D-CNN called C3D which was originally trained to classify sports from video clips.

I am freezing the convolution (feature extraction) layers and training the fully connected layers using gifs from GIPHY to classify the gifs for sentiment analysis (positive or negative).

Weights are pre loaded for all layers except the final fully connected layer.

I am using 5000 images (2500 positive, 2500 negative) for training with a 70/30 training/testing split using Keras. I am using the Adam optimizer with a learning rate of 0.0001.

The training accuracy increases and the training loss decreases during training but very early on the validation accuracy and loss does not improve as the model starts to overfit.

I believe I have enough training data and am using a dropout of 0.5 on both of the fully connected layers so how can I combat this overfitting?

The model architechture, training code and visualisations of training performance from Keras can be found below.

train_c3d.py

from training.c3d_model import create_c3d_sentiment_model
from ImageSentiment import load_gif_data
import numpy as np
import pathlib
from keras.callbacks import ModelCheckpoint
from keras.optimizers import Adam


def image_generator(files, batch_size):
    """
    Generate batches of images for training instead of loading all images into memory
    :param files:
    :param batch_size:
    :return:
    """
    while True:
        # Select files (paths/indices) for the batch
        batch_paths = np.random.choice(a=files,
                                       size=batch_size)
        batch_input = []
        batch_output = []

        # Read in each input, perform preprocessing and get labels
        for input_path in batch_paths:
            input = load_gif_data(input_path)
            if "pos" in input_path:  # if file name contains pos
                output = np.array([1, 0])  # label
            elif "neg" in input_path:  # if file name contains neg
                output = np.array([0, 1])  # label

            batch_input += [input]
            batch_output += [output]
        # Return a tuple of (input,output) to feed the network
        batch_x = np.array(batch_input)
        batch_y = np.array(batch_output)

        yield (batch_x, batch_y)


model = create_c3d_sentiment_model()
print(model.summary())
model.load_weights('models/C3D_Sport1M_weights_keras_2.2.4.h5', by_name=True)

for layer in model.layers[:14]:  # freeze top layers as feature extractor
    layer.trainable = False
for layer in model.layers[14:]:  # fine tune final layers
    layer.trainable = True

train_files = [str(filepath.absolute()) for filepath in pathlib.Path('data/sample_train').glob('**/*')]
val_files = [str(filepath.absolute()) for filepath in pathlib.Path('data/sample_validation').glob('**/*')]

batch_size = 8
train_generator = image_generator(train_files, batch_size)
validation_generator = image_generator(val_files, batch_size)

model.compile(optimizer=Adam(lr=0.0001),
              loss='binary_crossentropy',
              metrics=['accuracy'])

mc = ModelCheckpoint('best_model.h5', monitor='val_loss', mode='min', verbose=1)

history = model.fit_generator(train_generator, validation_data=validation_generator,
                              steps_per_epoch=int(np.ceil(len(train_files) / batch_size)),
                              validation_steps=int(np.ceil(len(val_files) / batch_size)), epochs=5, shuffle=True,
                              callbacks=[mc])

load_gif_data()

def load_gif_data(file_path):
    """
    Load and process gif for input into Keras model
    :param file_path:
    :return: Mean normalised image in BGR format as numpy array
             for more info see -> http://cs231n.github.io/neural-networks-2/
    """
    im = Img(fp=file_path)
    try:
        im.load(limit=16,  # Keras image model only requires 16 frames
                first=True)
    except:
        print("Error loading image: " + file_path)
        return
    im.resize(size=(112, 112))
    im.convert('RGB')
    im.close()

    np_frames = []
    frame_index = 0
    for i in range(16):  # if image is less than 16 frames, repeat the frames until there are 16
        frame = im.frames[frame_index]
        rgb = np.array(frame)
        bgr = rgb[..., ::-1]
        mean = np.mean(bgr, axis=0)
        np_frames.append(bgr - mean)  # C3D model was originally trained on BGR, mean normalised images
        # it is important that unseen images are in the same format
        if frame_index == (len(im.frames) - 1):
            frame_index = 0
        else:
            frame_index = frame_index + 1

    return np.array(np_frames)

model architecture

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv1 (Conv3D)               (None, 16, 112, 112, 64)  5248      
_________________________________________________________________
pool1 (MaxPooling3D)         (None, 16, 56, 56, 64)    0         
_________________________________________________________________
conv2 (Conv3D)               (None, 16, 56, 56, 128)   221312    
_________________________________________________________________
pool2 (MaxPooling3D)         (None, 8, 28, 28, 128)    0         
_________________________________________________________________
conv3a (Conv3D)              (None, 8, 28, 28, 256)    884992    
_________________________________________________________________
conv3b (Conv3D)              (None, 8, 28, 28, 256)    1769728   
_________________________________________________________________
pool3 (MaxPooling3D)         (None, 4, 14, 14, 256)    0         
_________________________________________________________________
conv4a (Conv3D)              (None, 4, 14, 14, 512)    3539456   
_________________________________________________________________
conv4b (Conv3D)              (None, 4, 14, 14, 512)    7078400   
_________________________________________________________________
pool4 (MaxPooling3D)         (None, 2, 7, 7, 512)      0         
_________________________________________________________________
conv5a (Conv3D)              (None, 2, 7, 7, 512)      7078400   
_________________________________________________________________
conv5b (Conv3D)              (None, 2, 7, 7, 512)      7078400   
_________________________________________________________________
zeropad5 (ZeroPadding3D)     (None, 2, 8, 8, 512)      0         
_________________________________________________________________
pool5 (MaxPooling3D)         (None, 1, 4, 4, 512)      0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 8192)              0         
_________________________________________________________________
fc6 (Dense)                  (None, 4096)              33558528  
_________________________________________________________________
dropout_1 (Dropout)          (None, 4096)              0         
_________________________________________________________________
fc7 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
dropout_2 (Dropout)          (None, 4096)              0         
_________________________________________________________________
nfc8 (Dense)                 (None, 2)                 8194      
=================================================================
Total params: 78,003,970
Trainable params: 78,003,970
Non-trainable params: 0
_________________________________________________________________
None

training visualisations

may be try different learning rate and, btw, run for at least 8-10 epoch — akhetos

Shubham Panchal Shubham Panchal · Accepted Answer · 2019-06-20T00:54:18

I think that the error is in the loss function and in the last Dense layer. As provided in the model summary, the last Dense layer is,

nfc8 (Dense) (None, 2)

The output shape is ( None , 2 ) meaning that the layer has 2 units. As you said earlier, you need to classify GIFs as positive or negative.

Classifying GIFs could be a binary classification problem or a multiclass classification problem ( with two classes ).

Binary classification has only 1 unit in the last Dense layer with a sigmoid activation function. But, here the model has 2 units in the last Dense layer.

Hence, the model is a multiclass classifier, but you have given a loss function of binary_crossentropy which is meant for binary classifiers ( with a single unit in the last layer ).

So, replacing the loss with categorical_crossentropy should work. Or edit the last Dense layer and change the number of units and activation function.

Hope this helps.

Why is my CNN overfitting and how can I fix it?

1 Answers