4
votes

I understand that the tensorflow API to train custom object detection datasets uses only rectangular bounding boxes, namely xmin, xmax, ymax, ymin. I also understand that a polygon bounding box will greatly improve detection accuracy as it removes any unnecessary information within the bounding box allowing for a far superior training dataset. I currently use labelImg to bound all my images for training and it does offer polygon boxes. My question is, is there a way to modify the code in the tensorflow API to work with polygon boxes as opposed to just rectangle boxes?

1

1 Answers

1
votes

No, at this point you may be more interested at semantic segmentation like Mask R-CNN (not implemented in Tensorflow's object detection API).The models in the API have specific differentiable layers (thus trainable) that find bounding boxes. The degrees of freedom on a polygon model would be more complicated. Mask R-CNN somewhat solves the polygon problem by identifying the object, then segmenting what within the bounding box is actually the object vs background.

Here's some introduction to some of popular algorithms used in object detection and how they work:

https://blog.athelas.com/a-brief-history-of-cnns-in-image-segmentation-from-r-cnn-to-mask-r-cnn-34ea83205de4?gi=b386f4274020