I understand the role of the bias node in neural nets, and why it is important for shifting the activation function in small networks. My question is this: is the bias still important in very large networks (more specifically, a convolutional neural network for image recognition using the ReLu activation function, 3 convolutional layers, 2 hidden layers, and over 100,000 connections), or does its affect get lost by the sheer number of activations occurring?
The reason I ask is because in the past I have built networks in which I have forgotten to implement a bias node, however upon adding one have seen a negligible difference in performance. Could this have been down to chance, in that the specifit data-set did not require a bias? Do I need to initialise the bias with a larger value in large networks? Any other advice would be much appreciated.