I was looking at the official batch normalization layer (BN) in TensorFlow however it didn't really explain how to use it for a convolutional layer. Does someone know how to do this? In particular its important that it applies and learns the same parameters per feature map (rather than per activation). In other order that it applies and learn BN per filter.
In a specific toy example say that I want to do conv2d with BN on MNIST (2D data essentially). Thus one could do:
W_conv1 = weight_variable([5, 5, 1, 32]) # 5x5 filters with 32 filters
x_image = tf.reshape(x, [-1,28,28,1]) # MNIST image
conv = tf.nn.conv2d(x_image, W_conv1, strides=[1, 1, 1, 1], padding='VALID') #[?,24,24,1]
z = conv # [?,24,24,32]
z = BN(z) # [?,24,24,32], essentially only 32 different scales and shift parameters to learn, per filer application
a = tf.nn.relu(z) # [?,24,24,32]
Where z = BN(z)
applies the BN to each feature created by each individual filter. In pseudocode:
x_patch = x[h:h+5,w:w+h,1] # patch to do convolution
z[h,w,f] = x_patch * W[:,:,f] = tf.matmul(x_patch, W[:,:,f]) # actual matrix multiplication for the convolution
we have a proper batch norm layer applied to it (in pseudocode omitting important details):
z[h,w,f] = BN(z[h,w,f]) = scale[f] * (z[h,w,f] - mu / sigma) + shift[f]
i.e. for each filter f
we apply BN.