You're right about the input and output layers.
How can I know how many neurons I need in the hidden layer?
There's no concrete rule that says exactly how many units you need in the hidden layers of a neural network. There are some general guidelines though, which I'll quote from one of my answers on Cross Validated.
Number of input units: Dimension of features x(i)
Number of output units: Number of classes
Reasonable default is one hidden layer, or if > 1 hidden layer, have the same number of hidden units in every layer (usually the more the better, anywhere from about 1X to 4X the number of input units).
You also asked:
And what was the purpose of them again?
The hidden layer units just transform the inputs into values (using coefficients selected during training) that can be used by the output layer.
Is it how many character classes I want? Say, O and Q are quite similar, so thy both would lead to one hidden layer neuron who later tell them apart?
No, that's not right. The number of output units will be the same as the number of classes you want. Each output unit will correspond to one letter, and will say whether or not the input image is that letter (with some probability). The output unit with the highest probability is the one you select as the right letter.