Here is my scenario
I have used EMNIST database of capital letters of english language.
My neural network is as follows
- Input layer has 784 neurons which are pixel values of image 28x28 grey scaled image divided by 255 so value will be in range[0,1]
- Hidden layer has 49 neuron fully connected to previous 784.
- Output layer has 9 neurons denoting class of image.
- Loss function is defined as cross entropy of softmax of output layer.
Initialized all weights as random real number from [-1,+1].
Now I did training with 500 fixed samples
for each class.
Simply, passed 500x9
images to train function which uses backpropagation
and does 100 iterations changing weights by learning_rate*derivative_of_loss_wrt_corresponding_weight
.
I found that when I use tanh activation on neuron then network learns faster than relu with learning rate 0.0001
.
I concluded that because accuracy on fixed test dataset was higher for tanh than relu . Also , loss value after 100 epochs was slightly lower for tanh.
Isn't relu expected to perform better ?