I have written a convolutional network in tensorflow with relu as an activation function, however it is not learning (loss is constant for both eval and train data set). For different activation functions everything works as it should.
Here is code where the nn is created:
def _create_nn(self):
current = tf.layers.conv2d(self.input, 20, 3, activation=self.activation)
current = tf.layers.max_pooling2d(current, 2, 2)
current = tf.layers.conv2d(current, 24, 3, activation=self.activation)
current = tf.layers.conv2d(current, 24, 3, activation=self.activation)
current = tf.layers.max_pooling2d(current, 2, 2)
self.descriptor = current = tf.layers.conv2d(current, 32, 5, activation=self.activation)
if not self.drop_conv:
current = tf.layers.conv2d(current, self.layer_7_filters_n, 3, activation=self.activation)
if self.add_conv:
current = tf.layers.conv2d(current, 48, 2, activation=self.activation)
self.descriptor = current
last_conv_output_shape = current.get_shape().as_list()
self.descr_size = last_conv_output_shape[1] * last_conv_output_shape[2] * last_conv_output_shape[3]
current = tf.layers.dense(tf.reshape(current, [-1, self.descr_size]), 100, activation=self.activation)
current = tf.layers.dense(current, 50, activation=self.last_activation)
return current
self.activiation is set to tf.nn.relu and self.last_activiation is set to tf.nn.softmax
loss function and optimizer are created here:
self._nn = self._create_nn()
self._loss_function = tf.reduce_sum(tf.squared_difference(self._nn, self.Y), 1)
optimizer = tf.train.AdamOptimizer()
self._train_op = optimizer.minimize(self._loss_function)
I tried changing variables initialization by passing tf.random_normal_initializer(0.1, 0.1) as initializers however it did not result in any change in loss function.
I would be grateful for help in making this neural network work with ReLu.
Edit: Leaky ReLu has the same problem
Edit: Small example where I managed to duplicate same error:
x = tf.constant([[3., 211., 123., 78.]])
v = tf.Variable([0.5, 0.5, 0.5, 0.5])
h_d = tf.layers.Dense(4, activation=tf.nn.leaky_relu)
h = h_d(x)
y_d = tf.layers.Dense(4, activation=tf.nn.softmax)
y = y_d(h)
d = tf.constant([[.5, .5, 0, 0]])
Gradients (as calculated with tf.gradients) for h_d and y_d kernels and biases are either equal or close to 0