I am training a convolutional neural network to classify an image into one of five classes (Class 1 - Class 5).
I have very few training images for Class 1 and so I performed some data augmentation by taking random crops and flipping the images to create more data. I have at least 3000 training images for Class 2 - 5. Now, my training set consists of 3000 images for each class and I train it using stochastic gradient descent.
My testing set consists of:
Class 1 - 8 images
Class 2 - 83 images
Class 3 - 227 images
Class 4 - 401 images
Class 5 - 123 images
My network correctly predicts:
Class 1 - 0 images
Class 2 - 0 images
Class 3 - 0 images
Class 4 - 399 images
Class 5 - 0 images
I don't expect a very accurate network given the limitations of my training set and 15000 images are probably not enough as well - but I would not have expected it to be so skewed given that Class 2 - 5 had the same number of distinct training images. If I had trained my network on a much larger proportion of Class 4 images then this would not surprise me. I would have expected the network to predict at least SOME of the other classes correctly.
Any thoughts?
EDIT:
Types of images: Buildings
Network architecture:
Input image - 256 x 256 x 3
Convolutional layer - 15 x 15 filters, 16 filters
Max 2x2 pooling layer
Convolutional layer - 11 x 11 filters, 32 filters
Max 2x2 pooling layer
Convolutional layer - 7 x 7 filters, 64 filters
Max 2x2 pooling layer
Fully connected layer - 1024 outputs
Softmax classifier layer - 5 outputs
Cost function: Cross-entropy