Understanding the keras.layers.BatchNormalization computation

Question

I am playing with the BatchNormalization layer, and I can't quite figure out the numerical results I get.

Let's consider we use BatchNormalization for computer vision.

We have 4D tensors.

Dimensions are: batch size, image height, image width, channels.

If I understand correctly, what BatchNormalization will do is:

At training time:
- for each batch, compute the mean MU and the standard deviation SIGMA. This is done per channel, and accross all rows and all columns of all images of the batch.
- keep an exponential moving average of MU (say MÛ) and of SIGMA (say SIĜMA) accross all batches
- use MÛ and SIĜMA to normalize pixels: normalized_pixel = ((input_pixel - MÛ) / sqrt(SIĜMA))
- an hyper-parameter epsilon is added to SIĜMA to prevent division by zero if SIĜMA becomes null at one point during training: normalized_pixel = ((input_pixel - MÛ) / sqrt(SIĜMA + epsilon))
- use a scale parameter GAMMA and an offset parameter BETA to re-scale normalized pixel: output_pixel = ((GAMMA x normalized_pixel) + BETA)
- GAMMA and BETA are trainable parameters, they are optimized during training
At inference time:
- MÛ and SIĜMA are now fixed parameters, just like GAMMA and BETA
- Same computations apply

Now, here comes my question...

First, I am only interested in what happens at inference time. I don't care about training, and I consider MÛ, SIĜMA, GAMMA and BETA to be fixed parameters.

I wrote a piece of python to test BatchNormalization on a (1, 3, 4, 1) tensor. Since there is only one channel, MÛ, SIĜMA, GAMMA and BETA have only 1 element each.

I chose MÛ = 0.0, SIĜMA = 1.0, GAMMA = 1.0 and BETA = 0.0, so that BatchNormalization has no effect.

Here is the code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import numpy
import keras
import math

input_batch                         =   numpy.array(
                                            [[
                                                [[ 1.0], [ 2.0], [ 3.0], [ 4.0]],
                                                [[ 5.0], [ 6.0], [ 7.0], [ 8.0]],
                                                [[ 9.0], [10.0], [11.0], [12.0]]
                                            ]],
                                            dtype=numpy.float32
                                        )


MU                                  = 0.0
SIGMA                               = 1.0
GAMMA                               = 1.0
BETA                                = 0.0

input_layer                         =   keras.layers.Input(
                                            shape = (
                                                        None,
                                                        None,
                                                        1
                                                    )
                                        )

BatchNormalization_layer            =   keras.layers.BatchNormalization(
                                            axis=-1,
                                            #epsilon=0.0,
                                            center=True,
                                            scale=True
                                        )(
                                            input_layer
                                        )

model                               =   keras.models.Model(
                                            inputs  = [input_layer],
                                            outputs = [BatchNormalization_layer]
                                        )

model.layers[1].set_weights(
    (
        numpy.array([GAMMA], dtype=numpy.float32),
        numpy.array([BETA],  dtype=numpy.float32),
        numpy.array([MU],    dtype=numpy.float32),
        numpy.array([SIGMA], dtype=numpy.float32),
    )
)

print model.predict(input_batch)

print ((((input_batch - MU) / math.sqrt(SIGMA)) * GAMMA) + BETA)

When I write explicitely the computation ((((input_batch - MU) / math.sqrt(SIGMA)) * GAMMA) + BETA) using numpy, I get the expected results.

However, when I use the keras.layers.BatchNormalization layer to perform the computation, I get similar results, only there are some kind of rounding errors or imprecisions:

Using TensorFlow backend.
[[[[ 0.9995004]
   [ 1.9990008]
   [ 2.9985013]
   [ 3.9980016]]

  [[ 4.997502 ]
   [ 5.9970026]
   [ 6.996503 ]
   [ 7.996003 ]]

  [[ 8.995503 ]
   [ 9.995004 ]
   [10.994504 ]
   [11.994005 ]]]]
[[[[ 1.]
   [ 2.]
   [ 3.]
   [ 4.]]

  [[ 5.]
   [ 6.]
   [ 7.]
   [ 8.]]

  [[ 9.]
   [10.]
   [11.]
   [12.]]]]

When I play with the values of MU*, SIGMA*, GAMMA and BETA, the output is affected as expected, so I believe I provide the parameters correctly to the layer.

I also tried to set the hyper-parameter epsilon of the layer to 0.0. It changes the results a little bit, but doe snot solve the issue.

Using TensorFlow backend.
[[[[ 0.999995 ]
   [ 1.99999  ]
   [ 2.999985 ]
   [ 3.99998  ]]

  [[ 4.999975 ]
   [ 5.99997  ]
   [ 6.9999647]
   [ 7.99996  ]]

  [[ 8.999955 ]
   [ 9.99995  ]
   [10.999945 ]
   [11.99994  ]]]]
[[[[ 1.]
   [ 2.]
   [ 3.]
   [ 4.]]

  [[ 5.]
   [ 6.]
   [ 7.]
   [ 8.]]

  [[ 9.]
   [10.]
   [11.]
   [12.]]]]

Can someone explain what is going on?

Thanks,

Julien

Julien REINAULD Julien REINAULD · Accepted Answer · 2019-11-29T01:07:36

I dug into tensorflow code (which is called as backend by keras). In the code of batch_normalization, I read:

  # Set a minimum epsilon to 1.001e-5, which is a requirement by CUDNN to
  # prevent exception (see cudnn.h).
  min_epsilon = 1.001e-5
  epsilon = epsilon if epsilon > min_epsilon else min_epsilon

Explain why setting epsilon = 0.0 in keras does not work.

When taking epsilon into consideration in my script, I get the correct result...

((((input_batch - MU) / math.sqrt(SIGMA + EPSILON)) * GAMMA) + BETA)

Using TensorFlow backend.
[[[[ 0.99503714]
   [ 1.9900743 ]
   [ 2.9851115 ]
   [ 3.9801486 ]]

  [[ 4.975186  ]
   [ 5.970223  ]
   [ 6.96526   ]
   [ 7.960297  ]]

  [[ 8.955335  ]
   [ 9.950372  ]
   [10.945409  ]
   [11.940446  ]]]]
[[[[ 0.99503714]
   [ 1.9900743 ]
   [ 2.9851115 ]
   [ 3.9801486 ]]

  [[ 4.975186  ]
   [ 5.970223  ]
   [ 6.96526   ]
   [ 7.960297  ]]

  [[ 8.955335  ]
   [ 9.950372  ]
   [10.945409  ]
   [11.940446  ]]]]

Understanding the keras.layers.BatchNormalization computation

1 Answers