0
votes

I have built an experimental neural network - the idea being that it can look at a JPEG image and identify which parts of the image are musical notation.

To train the network I have used various images of pages cut into 100 x 100 boxes which can either be valued at 1.0 (ie contains notation) or 0.0 (does not contain notation).

On training the network, though, it seems to have fixed itself that it - more or less - delivers a result of 0.5 every time (giving a square error of 0.25). The sigmoid (logistic) function is used for activation.

The network has 10,000 input neurons (for each pixel of the 100 x 100 image), 2000 hidden neurons (each input is attached to both a 'row' and a 'column' hidden neuron).

There is one output neuron.

Would I get better results with two output neurons? (ie one which activates for 'is music' and one which activates for 'is not music').

(You can see the C++ source for this here: https://github.com/mcmenaminadrian/musonet - though at any given time what is in the public repo may not be exactly what I am using on the machine.)

1
aargh - may have just been an issue of a missing minus sign in the code - meaning corrections to the hidden layer were being fought by mistaken corrections to the output layer. Just checking this now.adrianmcmenamin
I don't know much about the topic at all, but I'd go with one output neuron. If its output is below a certain threshold, it contains 'music' and if it is above another threshold, it isn't 'music'.eliaspr

1 Answers

0
votes

FWIW - the actual problem was because of the sign error in the code as described in the comment - so the two layers were fighting one another and, as you might expect, converged towards the middle.

But ... I based my code on a book from the 1990s - the much cited "Practical Neural Network Recipes in C++". There is nothing wrong with the book as such (though the C++ reflects that time's coding style and there is no use of STL classes and so on), but it does also come from an era where neural nets were not as well understood/engineered as today and so the basic design was quite flawed.

I'm now thinking about how best to implement a many layered convolutional network - not something discussed in the book at all (indeed it dismisses the idea of many layered networks relying instead on the fact that a single hidden layer NN is a general approximator).

I got some interesting results with the single hidden layer NN, but it's not really all that useful for image processing.