I'm using TensorFlow for a multi-target regression problem. Specifically, in a fully convolutional residual network for pixel-wise labeling with the input being an image and the label a mask. In my case I am using brain MR as images and the labels are mask of the tumors.
I have accomplish a fairly decent result using my net:
Although I am sure there is still room for improvement. Therefore, I wanted to add batch normalization. I implemented it as follows:
# Convolutional Layer 1
Z10 = tf.nn.conv2d(X, W_conv10, strides = [1, 1, 1, 1], padding='SAME')
Z10 = tf.contrib.layers.batch_norm(Z10, center=True, scale=True, is_training = train_flag)
A10 = tf.nn.relu(Z10)
Z1 = tf.nn.conv2d(Z10, W_conv1, strides = [1, 2, 2, 1], padding='SAME')
Z1 = tf.contrib.layers.batch_norm(Z1, center=True, scale=True, is_training = train_flag)
A1 = tf.nn.relu(Z1)
for each the conv and transpose layers of my net. But the results are not what I expected. the net with batch normalization has a terrible performance. In orange is the loss of the net without batch normalization while the blue has it:
Not only the net is learning slower, the predicted labels are also very bad in the net using batch normalization.
Does any one know why this might be the case? Could it be my cost function? I am currently using
loss = tf.nn.sigmoid_cross_entropy_with_logits(logits = dA1, labels = Y)
cost = tf.reduce_mean(loss)