I'm building a neural net using TensorFlow and Python, and using the Kaggle 'First Steps with Julia' dataset to train and test it. The training images are basically a set of images of different numbers and letters picked out of Google street view, from street signs, shop names, etc. The network has 2 fully-connected hidden layers.
The problem I have is that the network will very quickly train itself to only give back one answer: the most common training letter (in my case 'A'). The output is in the form of a (62, 1) vector of probabilities, one for each number and letter (upper- and lower-case). This vector is EXACTLY the same for all input images.
I've then tried to remove all of the 'A's from my input data, at which point the network changed to only give back the next most common input type (an 'E').
So, is there some way to stop my network stopping at a local minima (not sure if that's the actual term)? Is this even a generic problem for neural networks, or is it just that my network is broken somehow?
I'm happy to provide code if it would help.
EDIT: These are the hyperparameters of my network:
Input size : 400 (20x20 greyscale images)
Hidden layer 1 size : 100
Hidden layer 2 size : 100
Output layer size : 62 (Alphanumeric, lower- and upper-case)
Training data size : 4283 images
Validation data size : 1000 images
Test data size : 1000 images
Batch size : 100
Learning rate : 0.5
Dropout rate : 0.5
L2 regularisation parameter : 0