1
votes

Sorry that i only keep asking here. I will study hard to get ready to answer questions too!

Many papers and articles claim that there is no restriction on choosing activation functions for MLP.

It seems like it is only matter which one fits most for given condition.

And also the articles say that it is mathematically proven simple perceptron can not solve XOR problem.

I know that simple perceptron model used to use step function for its activation function.

But if basically it doesn't matter which activation function to use, then using

f(x)=1 if |x-a|<b 
f(x)=0 if |x-a|>b 

as an activation function works on XOR problem. (for 2input 1output no hidden layer perceptron model)

I know that using artificial functions is not good for learning model. But if it works anyway, then why the articles say that it is proven it doesn't work?

Does the article means simple perceptron model by one using step function? or does activation function for simple perceptron has to be step function unlike MLP? or am i wrong?

2
the function definitions in your question are invalid (absolute values cannot be negative :-). Please fix and clarifyBoris Gorelik
Still, both your conditions are the same. Did you mean f(x) = 1 if |x - a| > 0; f(x) = 0 if (x - a) = 0?Boris Gorelik

2 Answers

1
votes

In general, The problem is that non-differentiable activation functions (like the one you proposed) cannot be used for back-propagation and other techniques. Back propagation is a convenient way to estimate the correct threshold values (a and b in your example). All the popular activation functions are selected such that they approximate step behaviour while remaining differentiable.

1
votes

As bgbg mentioned, your activation is non-differentiable. If you use a differentiable activation function , which is required for MLP's to compute the gradients and update the weights, then the perceptron is simply fitting a line, which intuitively cannot solve the nonlinear XOR problem.