1
votes

I got a problem in understending the difference between MLP and SLP.

I know that in the first case the MLP has more than one layer (the hidden layers) and that the neurons got a non linear activation function, like the logistic function (needed for the gradient descent). But I have read that:

"if all neurons in an MLP had a linear activation function, the MLP could be replaced by a single layer of perceptrons, which can only solve linearly separable problems"

I don't understand why in the specific case of the XOR, which is not linearly separable, the equivalent MLP is a two layer network, that for every neurons got a linear activation function, like the step function. I understand that I need two line for the separation, but in this case I cannot apply the rule of the previous statment (the replacement of the MLP with the SLP).

Mlp for xor:

http://s17.postimg.org/c7hwv0s8f/xor.png

In the linked image the neurons A B and C have a linear activation function (like the step function)

Xor: http://s17.postimg.org/n77pkd81b/xor1.png

1

1 Answers

4
votes

A linear function is f(x) = a x + b. If we take another linear function g(z) = c z + d, and apply g(f(x)) (which would be the equivalent of feeding the output of one linear layer as the input to the next linear layer) we get g(f(x)) = c (a x + b) + d = ac x + cb + d = (ac) x + (cb + d) which is in itself another linear function.

The step function is not a linear function - You cannot write it as a x + b. That's why a MLP using a step function is strictly more expressive than a single layer perceptron using a step function.