bounding box format in tensorflow object detection api

Question

thanks in advance.
I try to use tensorflow object detection api with manual and web.
But I confused about bounding box format in tensorflow object detection api.
in tutorial, TODA(tensorflow object detection api) serve several pretrained model, and its trained with coco dataset.

in coco dataset,
bbox foramt is [xmin, ymin, width, height],
there are many bbox format, centerx, centery, width, height, or xmin, ymin, xmax,ymax

which bbox format should I use for TODA?? (should I use coco format??)
I cant find any info regarding this.

and x axis and y axis, this is also confused. I understand X means width, Y means height.

bun TODA code, I found this.
def assert_or_prune_invalid_boxes(boxes):
...
ymin, xmin, ymax, xmax = tf.split( boxes, num_or_size_splits=4, axis=1)

why x, y switching??
TODA axis is different from others??

thanks.

maryam mehboob maryam mehboob · Accepted Answer · 2021-05-01T08:38:51

Foreknow: There are two annotation formats for images, Pascal VOC and COCO formats. Both have their own specification here's the main difference between both:

Pascal VOC:

Stores annotation in .xml file format.
Bounding box format [x-top-left, y-top-left, x-bottom-right, y-bottom-right]
Create separate xml annotation file for each image in the dataset.

COCO:

Stores annotation in .json file format.
Bounding box format [x-top-left, y-top-left, width, height].
Create one annotation file for each training, testing and validation.

which bbox format should I use for TODA?? (should I use coco format??)

It depends on the annotation format that you are using for your dataset. If your annotations are in .xml format you have to use Pascal VOC format for bounding box or the other way round.

and x axis and y axis, this is also confused. why x, y switching?? TODA axis is different from others??

You don't need to be confused in this because if you are using Pascal VOC format then your annotation files must contain [x-top-left, y-top-left, x-bottom-right, y-bottom-right] or [x-min, y-min, x-max, y-max] that's all.

And if you are heading with COCO format then your annotation format must contain [x-top-left, y-top-left, width, height] or [x, y, width, height].

bounding box format in tensorflow object detection api

1 Answers