I am finetuning a 3D-CNN called C3D which was originally trained to classify sports from video clips.
I am freezing the convolution (feature extraction) layers and training the fully connected layers using gifs from GIPHY to classify the gifs for sentiment analysis (positive or negative).
Weights are pre loaded for all layers except the final fully connected layer.
I am using 5000 images (2500 positive, 2500 negative) for training with a 70/30 training/testing split using Keras. I am using the Adam optimizer with a learning rate of 0.0001.
The training accuracy increases and the training loss decreases during training but very early on the validation accuracy and loss does not improve as the model starts to overfit.
I believe I have enough training data and am using a dropout of 0.5 on both of the fully connected layers so how can I combat this overfitting?
The model architechture, training code and visualisations of training performance from Keras can be found below.
train_c3d.py
from training.c3d_model import create_c3d_sentiment_model
from ImageSentiment import load_gif_data
import numpy as np
import pathlib
from keras.callbacks import ModelCheckpoint
from keras.optimizers import Adam
def image_generator(files, batch_size):
"""
Generate batches of images for training instead of loading all images into memory
:param files:
:param batch_size:
:return:
"""
while True:
# Select files (paths/indices) for the batch
batch_paths = np.random.choice(a=files,
size=batch_size)
batch_input = []
batch_output = []
# Read in each input, perform preprocessing and get labels
for input_path in batch_paths:
input = load_gif_data(input_path)
if "pos" in input_path: # if file name contains pos
output = np.array([1, 0]) # label
elif "neg" in input_path: # if file name contains neg
output = np.array([0, 1]) # label
batch_input += [input]
batch_output += [output]
# Return a tuple of (input,output) to feed the network
batch_x = np.array(batch_input)
batch_y = np.array(batch_output)
yield (batch_x, batch_y)
model = create_c3d_sentiment_model()
print(model.summary())
model.load_weights('models/C3D_Sport1M_weights_keras_2.2.4.h5', by_name=True)
for layer in model.layers[:14]: # freeze top layers as feature extractor
layer.trainable = False
for layer in model.layers[14:]: # fine tune final layers
layer.trainable = True
train_files = [str(filepath.absolute()) for filepath in pathlib.Path('data/sample_train').glob('**/*')]
val_files = [str(filepath.absolute()) for filepath in pathlib.Path('data/sample_validation').glob('**/*')]
batch_size = 8
train_generator = image_generator(train_files, batch_size)
validation_generator = image_generator(val_files, batch_size)
model.compile(optimizer=Adam(lr=0.0001),
loss='binary_crossentropy',
metrics=['accuracy'])
mc = ModelCheckpoint('best_model.h5', monitor='val_loss', mode='min', verbose=1)
history = model.fit_generator(train_generator, validation_data=validation_generator,
steps_per_epoch=int(np.ceil(len(train_files) / batch_size)),
validation_steps=int(np.ceil(len(val_files) / batch_size)), epochs=5, shuffle=True,
callbacks=[mc])
load_gif_data()
def load_gif_data(file_path):
"""
Load and process gif for input into Keras model
:param file_path:
:return: Mean normalised image in BGR format as numpy array
for more info see -> http://cs231n.github.io/neural-networks-2/
"""
im = Img(fp=file_path)
try:
im.load(limit=16, # Keras image model only requires 16 frames
first=True)
except:
print("Error loading image: " + file_path)
return
im.resize(size=(112, 112))
im.convert('RGB')
im.close()
np_frames = []
frame_index = 0
for i in range(16): # if image is less than 16 frames, repeat the frames until there are 16
frame = im.frames[frame_index]
rgb = np.array(frame)
bgr = rgb[..., ::-1]
mean = np.mean(bgr, axis=0)
np_frames.append(bgr - mean) # C3D model was originally trained on BGR, mean normalised images
# it is important that unseen images are in the same format
if frame_index == (len(im.frames) - 1):
frame_index = 0
else:
frame_index = frame_index + 1
return np.array(np_frames)
model architecture
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1 (Conv3D) (None, 16, 112, 112, 64) 5248
_________________________________________________________________
pool1 (MaxPooling3D) (None, 16, 56, 56, 64) 0
_________________________________________________________________
conv2 (Conv3D) (None, 16, 56, 56, 128) 221312
_________________________________________________________________
pool2 (MaxPooling3D) (None, 8, 28, 28, 128) 0
_________________________________________________________________
conv3a (Conv3D) (None, 8, 28, 28, 256) 884992
_________________________________________________________________
conv3b (Conv3D) (None, 8, 28, 28, 256) 1769728
_________________________________________________________________
pool3 (MaxPooling3D) (None, 4, 14, 14, 256) 0
_________________________________________________________________
conv4a (Conv3D) (None, 4, 14, 14, 512) 3539456
_________________________________________________________________
conv4b (Conv3D) (None, 4, 14, 14, 512) 7078400
_________________________________________________________________
pool4 (MaxPooling3D) (None, 2, 7, 7, 512) 0
_________________________________________________________________
conv5a (Conv3D) (None, 2, 7, 7, 512) 7078400
_________________________________________________________________
conv5b (Conv3D) (None, 2, 7, 7, 512) 7078400
_________________________________________________________________
zeropad5 (ZeroPadding3D) (None, 2, 8, 8, 512) 0
_________________________________________________________________
pool5 (MaxPooling3D) (None, 1, 4, 4, 512) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 8192) 0
_________________________________________________________________
fc6 (Dense) (None, 4096) 33558528
_________________________________________________________________
dropout_1 (Dropout) (None, 4096) 0
_________________________________________________________________
fc7 (Dense) (None, 4096) 16781312
_________________________________________________________________
dropout_2 (Dropout) (None, 4096) 0
_________________________________________________________________
nfc8 (Dense) (None, 2) 8194
=================================================================
Total params: 78,003,970
Trainable params: 78,003,970
Non-trainable params: 0
_________________________________________________________________
None