I am trying to implement Batch Normalization (http://arxiv.org/pdf/1502.03167.pdf) in my convolutional neural network, but I am really confused as around what axis I should calculate mean and variance.
If an input to the conv-layer is of shape 3 * 224 * 224 * 32
where:
3- input channels.
224 * 224- shape of single channel
32- minibatch size
What should be the axis in the following formula
Mean = numpy.mean(input_layer, axis= ? )
And, if an input to the fully connected layer is of shape 100 * 32
where:
100- number of inputs
32- minibatch size
Again, what should be the axis in the following formula
Mean = numpy.mean(input_layer, axis= ? )