I have built an experimental neural network - the idea being that it can look at a JPEG image and identify which parts of the image are musical notation.
To train the network I have used various images of pages cut into 100 x 100 boxes which can either be valued at 1.0 (ie contains notation) or 0.0 (does not contain notation).
On training the network, though, it seems to have fixed itself that it - more or less - delivers a result of 0.5 every time (giving a square error of 0.25). The sigmoid (logistic) function is used for activation.
The network has 10,000 input neurons (for each pixel of the 100 x 100 image), 2000 hidden neurons (each input is attached to both a 'row' and a 'column' hidden neuron).
There is one output neuron.
Would I get better results with two output neurons? (ie one which activates for 'is music' and one which activates for 'is not music').
(You can see the C++ source for this here: https://github.com/mcmenaminadrian/musonet - though at any given time what is in the public repo may not be exactly what I am using on the machine.)