How to get the multiple bounding box coordinates in tensorflow object-detection API

Question

I want to get the multiple bounding boxes co-ordinates and the class of each bounding box and return it as a JSON file.

when I print boxes[] from the following code, It has a shape of (1,300,4). There are 300 coordinates in boxes[]. But there are only 2 on my predicted image. I want the coordinates of the bounding boxes which are predicted on my image.

Also, how would we know which bounding box is mapped to which category/class in the image?

for example, let's say I have a dog and a person in an image, how would I know which bounding box corresponds to the dog class and which one to the person class? The boxes[] give us an array of shape (1,300,4) without any indication of which bounding box corresponds to which class in the image.

I followed this answer to get bounding box coordinates from the 300 coordinates in the boxes[] using a threshold score.

I've tried getting the bounding box with the highest score. But it only returns a single bounding box even if the predicted image has multiple bounding boxes.

The bounding box coordinates with the highest score doesn't even match the bounding box coordinates on the predicted Image. How do I get bounding box coordinates which are on my predicted image?

            vis_util.visualize_boxes_and_labels_on_image_array(
                image_np,
                np.squeeze(boxes),
                np.squeeze(classes).astype(np.int32),
                np.squeeze(scores),
                category_index,
                use_normalized_coordinates=True,
                line_thickness=8)
            im = Image.fromarray(image_np)

            true_boxes = boxes[0][scores[0]==scores.max()]    # Gives us the box with max score
            for i in range(true_boxes.shape[0]):   # rescaling the coordinates
                ymin = true_boxes[i,0]*height
                xmin = true_boxes[i,1]*width
                ymax = true_boxes[i,2]*height
                xmax = true_boxes[i,3]*width

The coordinates I get from the above code xmin,ymin,xmax,ymax (which has the max score) doesn't exactly match the bounding box coordinates on the predicted image. They are off by a few pixels. Also, I only get one bounding box even though the predicted image has multiple bounding boxes and multiple classes (ex: A dog and a person).

I would like to return a JSON file with the image_name,bounding_boxes, and class corresponding to each bounding box.

Thanks, I'm new to this. Please ask if you didn't understand any part of the question.

One common mistake is messing up with height and width, have you tried switching the two and see if the results are correct? — danyfang
Height and width aren't the issues. I've checked the highest score bounding box in boxes[] and compared it to the bounding box on my image. They are pretty close but few pixels off. I don't think height and width values is the issue. — keshav N

Tong Tong · Accepted Answer · 2019-07-09T03:58:43

I followed this answer here link and I found all of my bounding box coordinates:

min_score_thresh=0.60
true_boxes = boxes[0][scores[0] > min_score_thresh]
for i in range(true_boxes.shape[0]):
    ymin = int(true_boxes[i,0]*height)
    xmin = int(true_boxes[i,1]*width)
    ymax = int(true_boxes[i,2]*height)
    xmax = int(true_boxes[i,3]*width)

    roi = image[ymin:ymax,xmin:xmax].copy()
    cv2.imwrite("box_{}.jpg".format(str(i)), roi)

How to get the multiple bounding box coordinates in tensorflow object-detection API

1 Answers