I have a code to get the image width and height, and class, xmin, xmax, ymin, ymax of the bounding boxes. but it is not clear how to populate the variables to generate the tfrecords. according to the code below,
height = None # Image height
width = None # Image width
filename = None # Filename of the image. Empty if image is not from file
encoded_image_data = None # Encoded image bytes
image_format = None # b'jpeg' or b'png'
xmins = [] # List of normalized left x coordinates in bounding box (1 per box)
xmaxs = [] # List of normalized right x coordinates in bounding box # (1 per box)
ymins = [] # List of normalized top y coordinates in bounding box (1 per box)
ymaxs = [] # List of normalized bottom y coordinates in bounding box # (1 per box)
classes_text = [] # List of string class name of bounding box (1 per box)
classes = [] # List of integer class id of bounding box (1 per box)
for multiple bounding boxes per image, how should the xmin, xmax, ymin,ymax and classes be populated? should they be row vectors or column vectors? Also, for the classes text, will it have a list of all the class names according to the sequence of the bounding boxes? also, what is expected in encoded image data?