I'm trying to use object_detection from tensorflow library to detect colored squares. For every image in train-eval-dataset, I should have the information about bounding box coordinates (with origin in top left corner) defined by 4 floating point numbers [ymin, xmin, ymax, xmax]. Now, let's suppose background_image is completly white image 300 x 300px. Code of my image-generator looks like this (pseudocode):
new_image = background_image.copy()
rand_x, rand_y = random_coordinates(new_image)
for (i = rand_x; i < rand_y + 100; ++i)
for (j = rand_y; j < rand_y + 100; ++j)
new_image[i][j] = color(red)
...so now we have 300 x 300px image of red square 100 x 100px on white background. The question is - should my bounding box contain only red colored pixels [rand_x, rand_y, rand_x + 100, rand_y + 100] or should it contain "white frame" like [rand_x - 5, rand_y - 5, rand_x + 105, rand_y + 105]? And maybe it does not matter? After 15h of training and evaluating (with bounding box coordinates = [rand_x, rand_y, rand_x + 100, rand_y + 100]) tensorboard shows me something like this:
Tensorboard informs that precission is about 0.1.
I understand well that after only 1100 steps results should not be breathtaking. I just want to exclude potential inaccuracies resulting from my fault.