1
votes

From the Wikipedia:

If a multilayer perceptron has a linear activation function in all neurons, that is, a linear function that maps the weighted inputs to the output of each neuron, then it is easily proved with linear algebra that any number of layers can be reduced to the standard two-layer input-output model (see perceptron).

I have seen Multilayer Perceptron replaced with Single Layer Perceptron and what I understood is that this is because combination of linear functions can be expressed with a linear function and this is the only reason, am I right?

So how does reduction process look like? i.e. if we had 3x5x2 MLP, how would SLP look like? Is size of input layer based on the number of parameters used to express linear function like in the answer of link above?:

f(x) = a x + b

g(z) = c z + d

g(f(x)) = c (a x + b) + d = ac x + cb + d = (ac) x + (cb + d)

so it would be 4 inputs? (a, b, c, d since it is combination of two linear functions with different parameters)

Thanks in advance!

1

1 Answers

2
votes

The size will be 3X2, and the hidden layer will just disappear, with all the weights of the hidden layer linear functions collapsed into a weights of the input layer. In this case of your example there 3 times 5 (input to hidden) i.e. 15 functions plus, 5 times 2 (hidden to output) i.e. 10 functions. So total 25 different linear functions. They are different because the weights in each case are different. So f(x) and g(z) as described by you are not a correct depiction.

The collapsing of hidden layer can be accomplished by simply taking an input neuron and an output neuron, and taking linear combination of all intermediate functions on the nodes that connect those two neurons together by passing through hidden layer. In the end you will be left with 6 unique functions which describe your 3X2 mapping.

For your own understanding try doing this on paper with a simple 2X2X1 MLP, with different weights on each node.