24
votes

I am about making backpropagation on a neural network that uses ReLU. In a previous project of mine, I did it on a network that was using Sigmoid activation function, but now I'm a little bit confused, since ReLU doesn't have a derivative.

Here's an image about how weight5 contributes to the total error. In this example, out/net = a*(1 - a) if I use sigmoid function.

What should I write instead of "a*(1 - a)" to make the backpropagation work?

3
Depends on the actual ReLU expression. There are several ReLUs that can be used. Nevertheless, it's just the derivative of the ReLU function with respect to its argument. And you can compute that either by hand or using e.g. wolfram alpha. Or just google it. - zegkljan

3 Answers

26
votes

since ReLU doesn't have a derivative.

No, ReLU has derivative. I assumed you are using ReLU function f(x)=max(0,x). It means if x<=0 then f(x)=0, else f(x)=x. In the first case, when x<0 so the derivative of f(x) with respect to x gives result f'(x)=0. In the second case, it's clear to compute f'(x)=1.

11
votes

Relu Derivative is 1 for x >= 0 and 0 for x < 0

enter image description here

2
votes

The relu derivative can be implemented with np.heaviside step function e.g. np.heaviside(x, 1). The second parameter defines the return value when x = 0, so a 1 means 1 when x = 0.