0
votes

I have a code to get the image width and height, and class, xmin, xmax, ymin, ymax of the bounding boxes. but it is not clear how to populate the variables to generate the tfrecords. according to the code below,

height = None # Image height width = None # Image width filename = None # Filename of the image. Empty if image is not from file encoded_image_data = None # Encoded image bytes image_format = None # b'jpeg' or b'png'

xmins = [] # List of normalized left x coordinates in bounding box (1 per box) xmaxs = [] # List of normalized right x coordinates in bounding box # (1 per box) ymins = [] # List of normalized top y coordinates in bounding box (1 per box)
ymaxs = [] # List of normalized bottom y coordinates in bounding box # (1 per box) classes_text = [] # List of string class name of bounding box (1 per box) classes = [] # List of integer class id of bounding box (1 per box)

for multiple bounding boxes per image, how should the xmin, xmax, ymin,ymax and classes be populated? should they be row vectors or column vectors? Also, for the classes text, will it have a list of all the class names according to the sequence of the bounding boxes? also, what is expected in encoded image data?

1

1 Answers

0
votes

Here is a guide to setting up a custom dataset for the Tensorflow Object Detection API: https://github.com/tensorflow/models/blob/master/object_detection/g3doc/using_your_own_dataset.md

In your case, the xmin, xmax, etc should just be an ordinary python list. And the image encoding should be jpeg or png (I believe both can be used interchangeably, but I recommend sticking to one format for consistency if possible).