I am newbie in convolutional neural networks and just have idea about feature maps and how convolution is done on images to extract features. I would be glad to know some details on applying batch normalisation in CNN.
I read this paper https://arxiv.org/pdf/1502.03167v3.pdf and could understand the BN algorithm applied on a data but in the end they mentioned that a slight modification is required when applied to CNN:
For convolutional layers, we additionally want the normalization to obey the convolutional property – so that different elements of the same feature map, at different locations, are normalized in the same way. To achieve this, we jointly normalize all the activations in a mini- batch, over all locations. In Alg. 1, we let B be the set of all values in a feature map across both the elements of a mini-batch and spatial locations – so for a mini-batch of size m and feature maps of size p × q, we use the effec- tive mini-batch of size m′ = |B| = m · pq. We learn a pair of parameters γ(k) and β(k) per feature map, rather than per activation. Alg. 2 is modified similarly, so that during inference the BN transform applies the same linear transformation to each activation in a given feature map.
I am total confused when they say "so that different elements of the same feature map, at different locations, are normalized in the same way"
I know what feature maps mean and different elements are the weights in every feature map. But I could not understand what location or spatial location means.
I could not understand the below sentence at all "In Alg. 1, we let B be the set of all values in a feature map across both the elements of a mini-batch and spatial locations"
I would be glad if someone cold elaborate and explain me in much simpler terms