How necessary are activation functions after dense layer in neural networks?

Question

I'm currently training multiple recurrent convolutional neural networks with deep q-learning for the first time.

Input is a 11x11x1 matrix, each network consists of 4 convolutional layer with dimensions 3x3x16, 3x3x32, 3x3x64, 3x3x64. I use stride=1 and padding=1. Each convLayer is followed by ReLU activation. The output is fed into a feedforward fully-connected dense layer with 128 units and after that into an LSTM layer, also containing 128 units. Two following dense layer produce separate advantage and value steams.

So training is running for a couple of days now and now I've realized (after I've read some related paper), I didn't add an activation function after the first dense layer (as in most of the papers). I wonder if adding one would significantly improve my network? Since I'm training the networks for university, I don't have unlimited time for training, because of a deadline for my work. However, I don't have enough experience in training neural networks, to decide on what to do... What do you suggest? I'm thankful for every answer!

Tushar Gupta Tushar Gupta · Accepted Answer · 2020-02-16T17:28:49

If I have to talk in general using an activation function helps you to include some non-linear property in your network.

The purpose of an activation function is to add some kind of non-linear property to the function, which is a neural network. Without the activation functions, the neural network could perform only linear mappings from inputs x to the outputs y. Why is this so?

Without the activation functions, the only mathematical operation during the forward propagation would be dot-products between an input vector and a weight matrix. Since a single dot product is a linear operation, successive dot products would be nothing more than multiple linear operations repeated one after the other. And successive linear operations can be considered as a one single learn operation.

A neural network without any activation function would not be able to realize such complex mappings mathematically and would not be able to solve tasks we want the network to solve.

How necessary are activation functions after dense layer in neural networks?

1 Answers