
I'm trying to use object_detection from tensorflow library to detect colored squares. For every image in train-eval-dataset, I should have the information about bounding box coordinates (with origin in top left corner) defined by 4 floating point numbers [ymin, xmin, ymax, xmax]. Now, let's suppose background_image is completly white image 300 x 300px. Code of my image-generator looks like this (pseudocode):

new_image = background_image.copy()
rand_x, rand_y = random_coordinates(new_image)
for (i = rand_x; i < rand_y + 100; ++i)
    for (j = rand_y; j < rand_y + 100; ++j)
        new_image[i][j] = color(red)

...so now we have 300 x 300px image of red square 100 x 100px on white background. The question is - should my bounding box contain only red colored pixels [rand_x, rand_y, rand_x + 100, rand_y + 100] or should it contain "white frame" like [rand_x - 5, rand_y - 5, rand_x + 105, rand_y + 105]? And maybe it does not matter? After 15h of training and evaluating (with bounding box coordinates = [rand_x, rand_y, rand_x + 100, rand_y + 100]) tensorboard shows me something like this:

Tensorboard informs that precission is about 0.1.

I understand well that after only 1100 steps results should not be breathtaking. I just want to exclude potential inaccuracies resulting from my fault.

please update linkGal_M
Thank you, links updated.Vether

1 Answers


Ideally, you want that your predicted boxes perfectly overlap the ground truth boxes.

This means that if A = [y_min, x_min, y_max, x_max] is the ground truth box, you want B (the predicted box) to be equal to A => A=B.

During the train phase is perfectly normal that your predictions are "around" the ground truth and there's no perfect match.

In reality, even during the test phase (at the end of the train) A=B is something difficult to achieve, because every classifier/regressor is not perfect.

In short: your predictions looks fine. With more epochs of train you'll probably get some better results