1
votes

When I use the labelImg (https://github.com/tzutalin/labelImg) to draw bounding boxes around my objects to output the annotation.xml file, it gives the coordinates of the bounding box. I use these annotations to feed into object detection model (ssd_mobilenet_v1_coco & faster_rcnn_resnet101_coco) in tensorflow. The output of the predictions (xmin, ymin, xmax, ymax) are from 0 - 1.

Are the inputs in my annotation.xml normalized to 0 - 1? I want to know this as I would like to obtain IOU by inputting the ground truth and predicted bounding box into my own IOU function. thank you

1

1 Answers

2
votes

Basically if you feed your model with a tf.record file, it contains your image and normalized coordinates of your bounding box. So you conversion from .xml files to a tf.record file will also normalize your bounding box coordinates.

Your models's output will be in normalized coordinates too. You can easily rescale them by multiplying with the image size:

x_min_abs = x_min_rel * image_width
x_max_abs = x_max_rel * image_width
y_min_abs = y_min_rel * image_height
y_max_abs = y_max_rel * image_height