My question is what is being normalized by BatchNormalization (BN).
I am asking, does BN normalize the channels for each pixel separately or for all the pixels together. And does it do it on a per image basis or on all the channels of the entire batch.
Specifically, BN is operating on X. Say, X.shape = [m,h,w,c]. So with axis=3, it is operating on the "c" dimension which is the number of channels (for rgb) or the number of feature maps.
So lets say the X is an rgb and thus has 3 channels. Does the BN do the following: (this is a simplified version of the BN to discuss the dimensional aspects. I understand that gamma and beta are learned but not concerned with that here.)
For each image=X in m:
- For each pixel (h,w) take the mean of the associated r, g, & b values.
- For each pixel (h,w) take the variance of the associated r, g, & b values
- Do
r = (r-mean)/var,g = (g-mean)/var, &b = (b-mean)/var, where r, g, & b are the red, green, & blue channels ofXrespectively. - Then repeat this process for the next image in
m,
In keras, the docs for BatchNormalization says:
axis: Integer, the axis that should be normalized (typically the features axis).
For instance, after a
Conv2Dlayer withdata_format="channels_first", setaxis=1inBatchNormalization.
But what is it exactly doing along each dimension?