I've been working on a CNN to localize a coin in an image. The CNN outputs a bounding box for the coin (x_min, y_min, x_max, y_max) and a few probabilities image_contains_coin, image_doesnt_contain_coin, coin_too_close (if the coin is too close to the camera), and dirty_coin if the coin is dirty (not clean/shiny).
So so I'm using the results of the last nn layer's mat mul for the bounding box outputs directly and the probabilities go through an extra sigmoid function. If there is no coin in the image, then the error of the bounding box and the 2 remaining probabilities should be ignored.
The error is calculated as RMS error of the bounding box plus the cross entropy error of the sigmoid/probability outputs.
I was wondering if this (the following code) is the correct way to do this kind of thing in tensorflow? and if there are any problems with my ideas or code?
Just to see where the outputs are in the array:
output_x1 = 0
output_y1 = 1
output_x2 = 2
output_y2 = 3
output_is_coin = 4
output_is_not_coin = 5
output_is_too_close = 6
output_is_dirty = 7
num_outputs = 8
And the output layer and cost model:
with tf.variable_scope('output'):
# Output, class prediction
out = cnn.output_layer(fc1, 256, num_outputs, 'out')
out_for_error = out
# Slice up the outputs and add a sigmoid to the probabilities
aabb_out = tf.slice(out, [0,0], [-1,4])
prob_out = tf.slice(out, [0,4], [-1,4])
prob_out = tf.nn.sigmoid(prob_out)
self.out = tf.concat([aabb_out, prob_out], 1, 'O')
with tf.variable_scope('error'):
# if the image is not of a plate, then the error for all other outputs needs
# to be ignored. This is done by replacing those components of the output
# array with the desired values (from training data) rather than the output
# of the nn
image_is_plate = tf.slice(image_infos, [0,4], [-1,1])
is_plate_mask = tf.constant([1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0])
not_plate_mask = tf.constant([0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0])
error_mask = tf.add(tf.multiply(is_plate_mask, image_is_plate), not_plate_mask)
inv_error_mask = tf.subtract(tf.ones([1,1]), error_mask)
masked_error = tf.multiply(error_mask, out_for_error) + tf.multiply(inv_error_mask, image_infos)
aabb_error = tf.slice(masked_error, [0,0], [-1,4])
prob_error = tf.slice(masked_error, [0,4], [-1,4])
# slice the outputs to run RMS error on the bounding box and cross entropy
# on the probabilities
image_infos_aabbs = tf.slice(image_infos, [0,0], [-1,4])
image_infos_probs = tf.slice(image_infos, [0,4], [-1,4])
self.error = tf.add(\
tf.reduce_mean(tf.square(tf.subtract(aabb_error, image_infos_aabbs))), \
tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=prob_error, labels=image_infos_probs)))
with tf.variable_scope('optimizer'):
# Define loss and optimizer
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
self.train_op = optimizer.minimize(self.error)
So, how does that look? Am I doing this the correct way? The results seem alright but it feels like something could be improved...