Neural Network with Input - Relu - SoftMax - Cross Entropy Weights and Activations grow unbounded

Question

I have implemented a neural network with 3 layers Input to Hidden Layer with 30 neurons(Relu Activation) to Softmax Output layer. I am using the cross entropy cost function. No outside libraries are being used. This is working on the NMIST dataset so 784 input neurons and 10 output neurons. I have got about 96% accuracy with hyperbolic tangent as my hidden layer activation. When I try to switch to relu activation my activations grow very fast which cause my weights grow unbounded as well until it blows up!

Is this a common problem to have when using relu activation?

I have tried L2 Regularization with minimal success. I end up having to set the learning rate lower by a factor of ten compared to the tanh activation and I have tried adjusting the weight decay rate accordingly and still the best accuracy I have gotten is about 90%. The rate of weight decay is still outpaced in the end by the updating of certain weights in the network which lead to an explosion. It seems everyone is just replacing their activation functions with relu and they experience better results, so I keep looking for bugs and validating my implementation. Is there more that goes into using relu as an activation function? Maybe I have problems in my implemenation, can someone validate accuracy with the same neural net structure?

shahaf shahaf · Accepted Answer · 2018-04-22T15:33:04

as you can see the Relu function is unbounded on positive values, thus creating the weights to grow

in fact, that's why hyperbolic tangent and alike function are being used in those cases, to bound the output value between a certain range (-1 to 1 or 0 to 1 in most cases)

there is another approach to deal with this phenomenon called weights decay the basic motivation is to get a more generalised model (avoid overfitting) and make sure the weights won't blow up you use a regulation value depending on the weight itself when update them meaning that bigger weights get bigger penalty

you can farther read about it here

Neural Network with Input - Relu - SoftMax - Cross Entropy Weights and Activations grow unbounded

1 Answers