How to correctly create a batch normalization layer for a convolutional layer in TensorFlow?

Question

I was looking at the official batch normalization layer (BN) in TensorFlow however it didn't really explain how to use it for a convolutional layer. Does someone know how to do this? In particular its important that it applies and learns the same parameters per feature map (rather than per activation). In other order that it applies and learn BN per filter.

In a specific toy example say that I want to do conv2d with BN on MNIST (2D data essentially). Thus one could do:

W_conv1 = weight_variable([5, 5, 1, 32]) # 5x5 filters with 32 filters
x_image = tf.reshape(x, [-1,28,28,1]) # MNIST image
conv = tf.nn.conv2d(x_image, W_conv1, strides=[1, 1, 1, 1], padding='VALID') #[?,24,24,1]
z = conv # [?,24,24,32]
z = BN(z) # [?,24,24,32], essentially only 32 different scales and shift parameters to learn, per filer application
a = tf.nn.relu(z) # [?,24,24,32]

Where z = BN(z) applies the BN to each feature created by each individual filter. In pseudocode:

x_patch = x[h:h+5,w:w+h,1] # patch to do convolution
z[h,w,f] = x_patch * W[:,:,f] = tf.matmul(x_patch, W[:,:,f]) # actual matrix multiplication for the convolution

we have a proper batch norm layer applied to it (in pseudocode omitting important details):

z[h,w,f] = BN(z[h,w,f]) = scale[f] * (z[h,w,f]  - mu / sigma) + shift[f]

i.e. for each filter f we apply BN.

fr_andres fr_andres · Accepted Answer · 2016-09-19T04:14:23

IMPORTANT: the links I provide here affect the tf.contrib.layers.batch_norm module, and not the usual tf.nn (see comments and post below)

I didn't test it, but the way TF expects you to use it seems to be documented in the convolution2d docstring:

def convolution2d(inputs,
              num_outputs,
              kernel_size,
              stride=1,
              padding='SAME',
              activation_fn=nn.relu,
              normalizer_fn=None,
              normalizer_params=None,
              weights_initializer=initializers.xavier_initializer(),
              weights_regularizer=None,
              biases_initializer=init_ops.zeros_initializer,
              biases_regularizer=None,
              reuse=None,
              variables_collections=None,
              outputs_collections=None,
              trainable=True,
              scope=None):
  """Adds a 2D convolution followed by an optional batch_norm layer.
  `convolution2d` creates a variable called `weights`, representing the
  convolutional kernel, that is convolved with the `inputs` to produce a
  `Tensor` of activations. If a `normalizer_fn` is provided (such as
  `batch_norm`), it is then applied. Otherwise, if `normalizer_fn` is
  None and a `biases_initializer` is provided then a `biases` variable would be
  created and added the activations.

Following this suggestion, you should add normalizer_fn='batch_norm' as a parameter to your conv2d method call.

Regarding the feature map vs. activation question, my guess is that TF would add the normalization layer as a new "node" on the top of the conv2d one when building the graph, and that both of them would modify the same weights variable (in your case, the W_conv1 object). I wouldn't describe anyway the task of the norm layer as "learning", but I'm not quite sure if I understood your point (maybe I can try to help further if you elaborate on that)

EDIT: Taking a closer look to the body of the function confirms my guess, and also explains how the normalized_params parameter is used. Reading from line 354:

outputs = nn.conv2d(inputs, weights, [1, stride_h, stride_w, 1],
padding=padding)
if normalizer_fn:
  normalizer_params = normalizer_params or {}
  outputs = normalizer_fn(outputs, **normalizer_params)
else:
  ...etc...

we see that the outputs variable, holding the respective output of each layer, is sequentially overwritten. So, if a normalizer_fn is given when building the graph, the output of nn.conv2d is going to be overwritten with the extra layer normalizer_fn. Here is where the **normalizer_params come to play, passed as a kwarg iterable to the given normalizer_fn. You can find the default parameters to batch_norm here, so passing a dictionary to normalizer_params with the ones that you wish to change should do the trick, something like this:

normalizer_params = {"epsilon" : 0.314592, "center" : False}

Hope it helps!

How to correctly create a batch normalization layer for a convolutional layer in TensorFlow?

2 Answers