CNN on small dataset is overfiting

Question

I want to classify pattern on image. My original image shape are 200 000*200 000 i reshape it to 96*96, pattern are still recognizable with human eyes. Pixel value are 0 or 1.

i'm using the following neural network.


train_X, test_X, train_Y, test_Y = train_test_split(cnn_mat, img_bin["Classification"], test_size = 0.2, random_state = 0)

class_weights = class_weight.compute_class_weight('balanced',
                                                 np.unique(train_Y),
                                                 train_Y)


train_Y_one_hot = to_categorical(train_Y)
test_Y_one_hot = to_categorical(test_Y)

train_X,valid_X,train_label,valid_label = train_test_split(train_X, train_Y_one_hot, test_size=0.2, random_state=13)


    model = Sequential()
    model.add(Conv2D(24,kernel_size=3,padding='same',activation='relu',
            input_shape=(96,96,1)))
    model.add(MaxPool2D())
    model.add(Conv2D(48,kernel_size=3,padding='same',activation='relu'))
    model.add(MaxPool2D())
    model.add(Conv2D(64,kernel_size=3,padding='same',activation='relu'))
    model.add(MaxPool2D())
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dense(256, activation='relu'))
    model.add(Dense(16, activation='softmax'))
    model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

train = model.fit(train_X, train_label, batch_size=80,epochs=20,verbose=1,validation_data=(valid_X, valid_label),class_weight=class_weights)

I have already run some experiment to find a "good" number of hidden layer and fully connected layer. it's probably not the most optimal architecture since my computer is slow, i just ran different model once and selected best one with matrix confusion, i didn't use cross validation,I didn't try more complex architecture since my number of data is small, i have read small architecture are the best, is it worth to try more complex architecture?

here the result with 5 and 12 epoch, bach size 80. This is the confusion matrix for my test set

As you can see it's look like i'm overfiting. When i only run 5 epoch, most of the class are assigned to class 0; With more epoch, class 0 is less important but classification is still bad

I added 0.8 dropout after each convolutional layer

e.g

    model.add(Conv2D(48,kernel_size=3,padding='same',activation='relu'))
    model.add(MaxPool2D())
    model.add(Dropout(0.8))
    model.add(Conv2D(64,kernel_size=3,padding='same',activation='relu'))
    model.add(MaxPool2D())
    model.add(Dropout(0.8))

With drop out, 95% of my image are classified in class 0.

I tryed image augmentation; i made rotation of all my training image, still used weighted activation function, result didnt improve. Should i try to augment only class with small number of image? Most of the thing i read says to augment all the dataset...

To resume my question are: Should i try more complex model?

Is it usefull to do image augmentation only on unrepresented class? then should i still use weight class (i guess no)?

Should i have hope to find a "good" model with cnn when we see the size of my dataset?

I do see that you have tried your best, but still, you can try the below options to reduce overfitting: 1. You can use flipping of images horizontally as data augmentation rather than rotating the image. 2. Use a bigger filter(kernel) size in the 1st layer: Maybe kernel_size=5 or 7 3. Instead of dropouts in convolutional layers, add them after dense layers, as they are nearly useless in convolutional layers. 4. Try using a smaller model, reducing the number of filters to 16 in 1st layer, then 32 in 2nd or so. — Ashwin Geet D'Sa
Do let me know if it works out and if it can be added as a solution. — Ashwin Geet D'Sa
Number of filter or kernel size don't really impact performance... the "shape" of my confusion matrix look the same, when i run few epoch i still have a lot of my image classified as 0, when i run more epoch class 0 is less important but still bad classifycation. i'll try new data augmentation; Thanks for your time btw — akhetos

pouyan pouyan · Accepted Answer · 2019-05-02T10:19:03

I think according to the imbalanced data, it is better to create a custom data generator for your model so that each of it's generated data batch, contains at least one sample from each class. And also it is better to use Dropout layer after each dense layer instead of conv layer. For data augmentation it is better to at least use combination of rotate, horizontal flip and vertical flip. there are some other approaches for data augmentation like using GAN network or random pixel replacement. For Gan you can check This SO post

For using Gan as data augmenter you can read This Article. For combination of pixel level augmentation and GAN pixel level data augmentation

CNN on small dataset is overfiting

2 Answers