1
votes

Im new to using TensorFlows object detection API but understand I need to convert a csv file to a TFRecord. I understand the format of the csv should be 8 columns, as follows:

filename, width, height, class, min, xmax, ymin, ymax

what im confused about is which corner of the image is assumed to be the origin?

Thanks for any help!

1

1 Answers

1
votes

The top left corner of the image is assumed to be the origin (0,0), with the width (x coordinates) increasing as you move to the right and the height (y coordinates) increasing as you move downwards.

So basically, the bottom-right corner of the image would be indexed as (width-1,height-1)

The format that you described above is basically the Pascal VOC annotation format in which, for a particular bounding box

xmin denotes the x coordinate of the top left corner
ymin denotes the y coordinate of the top left corner
xmax denotes the x coordinate of the bottom right corner
ymax denotes the y coordinate of the bottom right corner