1
votes

I am training a convolutional neural network to classify an image into one of five classes (Class 1 - Class 5).

I have very few training images for Class 1 and so I performed some data augmentation by taking random crops and flipping the images to create more data. I have at least 3000 training images for Class 2 - 5. Now, my training set consists of 3000 images for each class and I train it using stochastic gradient descent.

My testing set consists of:

Class 1 - 8 images
Class 2 - 83 images
Class 3 - 227 images
Class 4 - 401 images
Class 5 - 123 images

My network correctly predicts:

Class 1 - 0 images
Class 2 - 0 images
Class 3 - 0 images
Class 4 - 399 images
Class 5 - 0 images

I don't expect a very accurate network given the limitations of my training set and 15000 images are probably not enough as well - but I would not have expected it to be so skewed given that Class 2 - 5 had the same number of distinct training images. If I had trained my network on a much larger proportion of Class 4 images then this would not surprise me. I would have expected the network to predict at least SOME of the other classes correctly.

Any thoughts?

EDIT:

Types of images: Buildings

Network architecture:

Input image - 256 x 256 x 3
Convolutional layer - 15 x 15 filters, 16 filters
Max 2x2 pooling layer
Convolutional layer - 11 x 11 filters, 32 filters
Max 2x2 pooling layer
Convolutional layer - 7 x 7 filters, 64 filters
Max 2x2 pooling layer
Fully connected layer - 1024 outputs
Softmax classifier layer - 5 outputs

Cost function: Cross-entropy

2
There's a specific datascience stack exchange - maybe your question would be better there?user3791372
You should describe the architecture of your convnet as well as the objects you are classifying.Eli Korvigo
@user3791372 Thanks for pointing me in that direction!jlhw
@EliKorvigo I have added those in as well. Thank you!jlhw
having skewed classification has nothing to do with using convolutions. Please report: exact sizes of training sets. Training errors (crucial part). Training method used. On the side note - your network looks quite simple (small) given the size of the input.lejlot

2 Answers

0
votes

This can happen (although not very common). I think you have not trained long enough. A CNN tries to get one class at a time correctly, which is generally the one with the maximum number of samples if you have not normalized the loss. This is because it gets maximum benefit from predicting that class correctly in the beginning. As it gets better and better with time, it no longer gets that benefit and then tries to predict other classes correctly also.

You can weight your softmax loss based on the class frequencies or re-sample your dataset to get around this problem. I also see that your CNN is not deep enough, the filter sizes are not appropriate for the resolution which you have and the number of training samples are too less. I would recommend you to fine-tune some pre-trained neural networks like VGG, GoogLeNet, ResNet or AlexNet for your task.

For caffe you can follow this example, http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html

0
votes

I think I am quite late for giving an answer, but I'm willing to share my experience/knowledge. :)

In practice, it is recommended to use small filter sizes such as 3x3, and 5x5 at the maximum since these give lesser parameters that decrease training time with no difference in accuracy as compared to 15x15 etc. There have been researches about this (see ImageNet competitions winners 2013-2015).

On one hand, one possible reason that your model is biased to one class is because they are not balanced. What you can do is penalized the model to be more biased to the class with smaller instances. In Keras, there is a class_weight parameter that allows you to scale your loss function.

1 - Sequential - Keras

2- How to set class weights for imbalanced classes in Keras?