I am working on the Skin Cancer Images available in Kaggle for my mini-project. I am trying to use different CNN models for comparison. Both VGG16 and VGG19 work on the data and yield acceptable results with >90% of accuracy on training, validation data, and around 85% on testing data.
However, it appears ResNet50/152 overfit the data as it could also produce >90% accuracy on training data but fails on validation/testing data (all validation/testing images are classified as 1/0). I have tried image augmentation and dropout but both of them don't work for me. Appreaciate if I could get any comment on the following block of codes, thanks so much!
IMAGE_WIDTH = 224
IMAGE_HEIGHT = 224
IMAGE_CHANNELS = 3
train_data, valid_data, train_label, valid_label = train_test_split(trainval_data, trainval_label, test_size=0.05, random_state=999)
train_label = to_categorical(train_label)
valid_label = to_categorical(valid_label)
test_label = to_categorical(test_label)
train_array = np.zeros((len(train_data), IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_CHANNELS))
test_array = np.zeros((len(test_data), IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_CHANNELS))
valid_array = np.zeros((len(valid_data), IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_CHANNELS))
for i in range(len(train_data)):
image = load_img(train_data[i], target_size=(224, 224))
train_array[i] = img_to_array(image)
for i in range(len(test_data)):
image = load_img(test_data[i], target_size=(224, 224))
test_array[i] = img_to_array(image)
for i in range(len(valid_data)):
image = load_img(valid_data[i], target_size=(224, 224))
valid_array[i] = img_to_array(image)
train_array = train_array/255.0
test_array = test_array/255.0
valid_array = valid_array/255.0
def img_transfer(image):
image = image - image.mean()
return image
# data pre-processing for training
train_datagen = ImageDataGenerator(
rotation_range = 20,
width_shift_range = 0.2,
height_shift_range = 0.2,
shear_range = 0.2,
zoom_range = 0.2,
fill_mode = 'nearest',
horizontal_flip = True,
preprocessing_function=img_transfer)
# data pre-processing for validation
validate_datagen = ImageDataGenerator(
rotation_range = 20,
width_shift_range = 0.2,
height_shift_range = 0.2,
shear_range = 0.2,
zoom_range = 0.2,
fill_mode = 'nearest',
horizontal_flip = True,
preprocessing_function=img_transfer)
test_datagen = ImageDataGenerator(
preprocessing_function=img_transfer)
train_datagen.fit(train_array, augment=True, seed=8021)
train_generator = train_datagen.flow(train_array, train_label, shuffle=True, seed = 8021)
validate_datagen.fit(valid_array, augment=True, seed=8021)
val_generator = validate_datagen.flow(valid_array, valid_label, shuffle=True, seed = 8021)
resnet152model = ResNet152(include_top=False, classes=2, input_shape = (224,224,3))
#print(vgg16model.summary())
for layer in resnet152model.layers:
layer.trainable = False
x = resnet152model.output
x = Flatten()(x)
x = Dense(512, activation="relu")(x)
x = Dense(256, activation="relu")(x)
predictions = Dense(2, activation="softmax")(x)
resnet152model = Model(inputs=resnet152model.input,outputs=predictions)
earlystop = EarlyStopping(patience=10)
learning_rate_reduction = ReduceLROnPlateau(monitor='val_accuracy',
patience=5,
verbose=1,
factor=0.5,
min_lr=0.00001)
filepath="weights-improvement-{epoch:02d}-{val_accuracy:.2f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', verbose=1, save_best_only=True, mode='max')
callbacks_list = [earlystop, checkpoint, learning_rate_reduction]
resnet152model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history1 = resnet152model.fit_generator(train_generator, validation_data=val_generator,
epochs=30, verbose=1, callbacks=callbacks_list)
Epoch 1/30
79/79 [==============================] - 65s 819ms/step - loss: 3.4226 - accuracy: 0.7673 - val_loss: 0.5739 - val_accuracy: 0.6818
Epoch 00001: val_accuracy improved from -inf to 0.68182, saving model to weights-improvement-01-0.68.hdf5
Epoch 2/30
79/79 [==============================] - 44s 559ms/step - loss: 0.7746 - accuracy: 0.8092 - val_loss: 0.3414 - val_accuracy: 0.6818
Epoch 00002: val_accuracy did not improve from 0.68182
Epoch 3/30
79/79 [==============================] - 44s 559ms/step - loss: 0.4426 - accuracy: 0.8407 - val_loss: 0.7188 - val_accuracy: 0.6818
Epoch 00003: val_accuracy did not improve from 0.68182
Epoch 4/30
79/79 [==============================] - 44s 560ms/step - loss: 0.4133 - accuracy: 0.8415 - val_loss: 0.5881 - val_accuracy: 0.6818
Epoch 00004: val_accuracy did not improve from 0.68182
Epoch 5/30
79/79 [==============================] - 44s 558ms/step - loss: 0.3836 - accuracy: 0.8595 - val_loss: 1.2216 - val_accuracy: 0.3182
Epoch 00005: val_accuracy did not improve from 0.68182
Epoch 6/30
79/79 [==============================] - 44s 558ms/step - loss: 0.3961 - accuracy: 0.8551 - val_loss: 1.0454 - val_accuracy: 0.3182
Epoch 00006: val_accuracy did not improve from 0.68182
Epoch 00006: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 7/30
79/79 [==============================] - 44s 558ms/step - loss: 0.3074 - accuracy: 0.8719 - val_loss: 0.9247 - val_accuracy: 0.3182