I was learning about activation functions in neural networks. Problem with a simple linear function is referred below:
A = cx, derivative with respect to x is c. That means the gradient has no relationship with X. It is a constant gradient and the descent is going to be on a constant gradient. If there is an error in prediction, the changes made by backpropagation is constant and not depending on the change in input delta(x) !!!
A derivative of the ReLu function is also a constant. My question is "How can we backpropagate with ReLu function when its derivative is a constant and not depending on the change in input?"