Neural networks - why everybody has different approach with XOR

Question

Currently I'm trying to learn how to work with neural networks by reading books, but mostly internet tutorials.

I often see that "XOR is 'Hello World' of neural networks".

But here is a thing: The author of one tutorial says that for neural network that calculates XOR value we should use 1 hidden layer with 2 neurons. Also he uses backpropagation with deltas to adjust weights.

I implemented this, but even after 1 million epochs I have a problem that network is stuck with input data 1 and 1. There should be "0" as an answer, but answer is usually 0.5something. I checked my code, it is correct.

If I'll try to add just 1 more neuron in the hidden layer, network is successfully calculating XOR after ~50 000 epochs.

At the same time some people saying that "XOR is not a trivial task and we should use network with 2-3 or more layers". Why?

Come on, if XOR creates so much problems, maybe we shouldn't use it as 'hello world' of neural networks? Please explain what is going on.

Quote: "I checked my code, it is correct." If your own code does not do what you want it to do, by definition it's not correct. Sorry for this distraction. Carry on... :-) — Wouter van Nifterick
Look there are so many opinions on how neural network for xor should look. One of them is that we should use 3 neurons - just like I did. If you have some constructive thoughts - please share them. — user64675

Tryer Tryer · Accepted Answer · 2018-03-04T01:34:27

So neural networks are really interesting. Theres a proof that says that a single perceptron can learn any linear function given enough time. Even more impressive, a neural network with one hidden layer can apparently learn any function, though I've yet to see a proof on that one.

XOR is a good function for teaching neural networks because as CS students, those in the class are likely already familiar with it. In addition, it is not trivial in the sense that a single perceptron can learn it. it isn't linear. See this graphic I put together.

There is no line that separates these values. YET, it is simple enough for humans to understand, and more importantly, that a human can understand the neural network that can solve it. NN are very blackbox-y, it becomes hard to tell why they work really fast. Hell, here is another network config that can solve XOR.

Your example of a more complicated network solving it faster shows the power that comes from combining more neurons and more layers. Its absolutely unnecessary to use 2-3 hidden layers to solve it, but it sure helps speed up the process.

The point is that it is a simple enough problem to solve by human and on a black-board in class, while also being slightly more challenging than a given linear function.

EDIT: Another fantastic example for teaching NNs practically is the MNIST hand drawn digit classification data set. I find that it very easily shows a problem that is simultaneously very simple for humans to understand, very hard to write a non learning program for, and a very practical use case for machine learning. The problem is that the network structure is impossible to draw on a blackboard and trace what is happening in a way practical for a class. XOR achieves this.

EDIT 2: Also, without the code it will probably be hard to diagnose why it isn't converging. Did you write the neurons yourself? What about the optimization function, etc?

EDIT 3: If the output of your function last node is 0.5, try using a step squashing function that makes all values below .5 into 0, and all values above 0.5 into 1. You only have binary output anyway so why bother with a continuous activation on the last node?

Neural networks - why everybody has different approach with XOR

1 Answers