Neural network only learns most common training image

Question

I'm building a neural net using TensorFlow and Python, and using the Kaggle 'First Steps with Julia' dataset to train and test it. The training images are basically a set of images of different numbers and letters picked out of Google street view, from street signs, shop names, etc. The network has 2 fully-connected hidden layers.

The problem I have is that the network will very quickly train itself to only give back one answer: the most common training letter (in my case 'A'). The output is in the form of a (62, 1) vector of probabilities, one for each number and letter (upper- and lower-case). This vector is EXACTLY the same for all input images.

I've then tried to remove all of the 'A's from my input data, at which point the network changed to only give back the next most common input type (an 'E').

So, is there some way to stop my network stopping at a local minima (not sure if that's the actual term)? Is this even a generic problem for neural networks, or is it just that my network is broken somehow?

I'm happy to provide code if it would help.

EDIT: These are the hyperparameters of my network:

Input size : 400 (20x20 greyscale images)
Hidden layer 1 size : 100
Hidden layer 2 size : 100
Output layer size : 62 (Alphanumeric, lower- and upper-case)

Training data size : 4283 images
Validation data size : 1000 images
Test data size : 1000 images

Batch size : 100
Learning rate : 0.5
Dropout rate : 0.5
L2 regularisation parameter : 0

Maybe you have a learning rate too high? Could you post the hyper hyperparameters of your model (architecture, learning rate, batch size...) — Olivier Moindrot

user1269942 user1269942 · Accepted Answer · 2016-06-15T22:25:05

Trying to squeeze blood from a stone!

I'm skeptical that with 4283 training examples your net will learn 62 categories...that's a big ask for such a small amount of data. Especially since your net is not a conv net...and it's forced to reduce its dimensionality to 100 at the first layer. You may as well pca it and save time.

Try this:
Step 1: download an MNIST example and learn how to train and run it.

Step 2: use the same mnist network design and throw your data at it...see how it goes. you may need to pad your images. Train and then run it on your test data.

Now step 3: take your fully trained step 1 mnist model and "finetune" it by continuing to train with your data(only) and with a lower learning rate for a few epochs(ultimately determine #epochs by validation). Then run it on your test data again and see how it does. Look up "transfer learning"...and a "finetuning example" for your toolkit.(Note that for finetuning you need to mod the output layer of the net)

I'm not sure how big your original source images are but you can resize them and throw a pre-trained cifar100 net at it(finetuned) or even an imagenet if the source images are big enough. Hmmm cifar/imagnet are for colour images...but you could replicate your greyscale to each rgb band for fun.

Mark my words...these steps may "seem simple"...but if you can work through it and get some results(even if they're not great results) by finetuning with your own data, you can consider yourself a decent NN technician.

One good tutorial for finetuning is on the Caffe website...flickr style(I think)...there's gotta be one for TF too.

The last step is to design your own CNN...be careful when changing filter sizes--you need to understand how it affects outputs of each layer and how information is preserved/lost.

I suppose another thing to do is to do "data augmentation" to get yourself some more of it. slight rotations/resizing/lighting...etc. Tf has some nice preprocessing for doing some of this...but some will need to be done by yourself.

good luck!

Neural network only learns most common training image

4 Answers

EDIT