Convolutional neural network making skewed predictions

Question

I am training a convolutional neural network to classify an image into one of five classes (Class 1 - Class 5).

I have very few training images for Class 1 and so I performed some data augmentation by taking random crops and flipping the images to create more data. I have at least 3000 training images for Class 2 - 5. Now, my training set consists of 3000 images for each class and I train it using stochastic gradient descent.

My testing set consists of:

Class 1 - 8 images
Class 2 - 83 images
Class 3 - 227 images
Class 4 - 401 images
Class 5 - 123 images

My network correctly predicts:

Class 1 - 0 images
Class 2 - 0 images
Class 3 - 0 images
Class 4 - 399 images
Class 5 - 0 images

I don't expect a very accurate network given the limitations of my training set and 15000 images are probably not enough as well - but I would not have expected it to be so skewed given that Class 2 - 5 had the same number of distinct training images. If I had trained my network on a much larger proportion of Class 4 images then this would not surprise me. I would have expected the network to predict at least SOME of the other classes correctly.

Any thoughts?

EDIT:

Types of images: Buildings

Network architecture:

Input image - 256 x 256 x 3
Convolutional layer - 15 x 15 filters, 16 filters
Max 2x2 pooling layer
Convolutional layer - 11 x 11 filters, 32 filters
Max 2x2 pooling layer
Convolutional layer - 7 x 7 filters, 64 filters
Max 2x2 pooling layer
Fully connected layer - 1024 outputs
Softmax classifier layer - 5 outputs

Cost function: Cross-entropy

There's a specific datascience stack exchange - maybe your question would be better there? — user3791372
You should describe the architecture of your convnet as well as the objects you are classifying. — Eli Korvigo
having skewed classification has nothing to do with using convolutions. Please report: exact sizes of training sets. Training errors (crucial part). Training method used. On the side note - your network looks quite simple (small) given the size of the input. — lejlot

Bharat Bharat · Accepted Answer · 2016-05-09T21:31:01

This can happen (although not very common). I think you have not trained long enough. A CNN tries to get one class at a time correctly, which is generally the one with the maximum number of samples if you have not normalized the loss. This is because it gets maximum benefit from predicting that class correctly in the beginning. As it gets better and better with time, it no longer gets that benefit and then tries to predict other classes correctly also.

You can weight your softmax loss based on the class frequencies or re-sample your dataset to get around this problem. I also see that your CNN is not deep enough, the filter sizes are not appropriate for the resolution which you have and the number of training samples are too less. I would recommend you to fine-tune some pre-trained neural networks like VGG, GoogLeNet, ResNet or AlexNet for your task.

For caffe you can follow this example, http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html

Convolutional neural network making skewed predictions

2 Answers