according to wikipedia, with the delta rule we adjust the weight by:
dw = alpha * (ti-yi)*g'(hj)xi
when alpha = learning constant, ti - true answer, yi - perceptron's guess,g' = the derivative of the activation function g with respect to the weighted sum of the perceptron's inputs, xi - input.
The part that I don't understand in this formula is the multiplication by the derivative g'. let g = sign(x) (the sign of the weighted sum). so g' is always 0, and dw = 0. However, in code examples I saw in the internet, the writers just omitted the g' and used the formula:
dw = alpha * (ti-yi)*(hj)xi
I will be glad to read a proper explanation!
thank you in advance.