Tensorflow ReLu doesn't work?

Question

I have written a convolutional network in tensorflow with relu as an activation function, however it is not learning (loss is constant for both eval and train data set). For different activation functions everything works as it should.

Here is code where the nn is created:

def _create_nn(self):
    current = tf.layers.conv2d(self.input, 20, 3, activation=self.activation)
    current = tf.layers.max_pooling2d(current, 2, 2)
    current = tf.layers.conv2d(current, 24, 3, activation=self.activation)
    current = tf.layers.conv2d(current, 24, 3, activation=self.activation)
    current = tf.layers.max_pooling2d(current, 2, 2)
    self.descriptor = current = tf.layers.conv2d(current, 32, 5, activation=self.activation)
    if not self.drop_conv:
        current = tf.layers.conv2d(current, self.layer_7_filters_n, 3, activation=self.activation)
    if self.add_conv:
        current = tf.layers.conv2d(current, 48, 2, activation=self.activation)

    self.descriptor = current

    last_conv_output_shape = current.get_shape().as_list()
    self.descr_size = last_conv_output_shape[1] * last_conv_output_shape[2] * last_conv_output_shape[3]

    current = tf.layers.dense(tf.reshape(current, [-1, self.descr_size]), 100, activation=self.activation)
    current = tf.layers.dense(current, 50, activation=self.last_activation)

    return current

self.activiation is set to tf.nn.relu and self.last_activiation is set to tf.nn.softmax

loss function and optimizer are created here:

    self._nn = self._create_nn()

    self._loss_function = tf.reduce_sum(tf.squared_difference(self._nn, self.Y), 1)

    optimizer = tf.train.AdamOptimizer()
    self._train_op = optimizer.minimize(self._loss_function)

I tried changing variables initialization by passing tf.random_normal_initializer(0.1, 0.1) as initializers however it did not result in any change in loss function.

I would be grateful for help in making this neural network work with ReLu.

Edit: Leaky ReLu has the same problem

Edit: Small example where I managed to duplicate same error:

x = tf.constant([[3., 211., 123., 78.]])
v = tf.Variable([0.5, 0.5, 0.5, 0.5])
h_d = tf.layers.Dense(4, activation=tf.nn.leaky_relu)
h = h_d(x)
y_d = tf.layers.Dense(4, activation=tf.nn.softmax)
y = y_d(h)
d = tf.constant([[.5, .5, 0, 0]])

Gradients (as calculated with tf.gradients) for h_d and y_d kernels and biases are either equal or close to 0

root root · Accepted Answer · 2018-06-13T15:04:27

In a very improbable case, all activations in some layer can be negative for all samples. They are set to zero by the ReLU and there is no learning progress because the gradient is zero in the negative part of the ReLU.

Things that make this more probable are a small dataset, weird scaling of input features, inappropriate weight initialization, and/or few channels in intermediate layers.

Here you use random_normal_initializer with mean=0.1, so maybe your inputs are all negative, and thus get mapped to negative values. Try mean=0, or rescale input features.

You can also try a Leaky ReLU. Also maybe the learning rate is too small or too large.

Tensorflow ReLu doesn't work?

2 Answers