As described in the original paper of batch normalization, batch normalization on 1-D feature (for example, from a fully connected layer) and that on 2-D feature (for example, from a convolutional layer) are different in a nontrivial way.
The tensorflow library provided an easy way to batch normalize with 1-D feature but I'm not sure if it is the same case for 2-D. The tool is tf.contrib.layers.batch_norm
.
I don't fully understand this method but can we apply this method for 2-D batch normalization?
I saw some people use it on 2-D feature map (with multiple channels): example 1 (link 1, link 2).