How to make sense of Open Images Dataset's bounding-box annotations?

Question

So I downloaded the Open Images Dataset via TensorFlow Datasets (https://www.tensorflow.org/datasets). I can view the images and annotations fine but I can't make sense of the weird format they are using for the object bounding boxes.

For example: I have an image showing an elephant with a width of 682 and a height of 1024. The bounding box coordinates of the elephant are: [0.03875 , 0.188732, 0.954375, 0.979343]. According to the documentation the 4 numbers represent xMin, xMax, yMin, yMax.

How do I display this weirdly small rectangle with, let's say matplotlib?
I already tried multiplying the coordinates with the width and height respectively but the resulting rectangles don't make any sense. I also switched the values for x_1 and x_2 etc. around but that didn't work either.

This is my code:

for e in train_data:

    np_img = e["image"]

    height = np.shape(np_img)[0]
    width = np.shape(np_img)[1]

    fig, ax = plt.subplots(1)

    ax.imshow(np_img)

    for bbox in e["bobjects"]["bbox"]:

        x_1 = bbox[0]
        x_2 = bbox[1]

        y_1 = bbox[2]
        y_2 = bbox[3]

        rect = patches.Rectangle((x_1 * width, y_2 * height), (x_2 * width - x_1 * width), (y_1 * height - y_2 * height), linewidth=1, edgecolor='r', facecolor='none')

        ax.add_patch(rect)

    plt.show()

    # Only one iteration for testing
    break

Seems to be the same as this question from half an hour earlier. Maybe you can team up with the questioner of that question and together you can come up with a question that has a minimal reproducible example and allows to understand the actual issue. — ImportanceOfBeingErnest

bennyOoO bennyOoO · Accepted Answer · 2019-04-25T09:33:30

I found the solution myself: As it turns out, when using Open Images from the TensorFlow Datasets API the coordinates for the bounding boxes are in a different order than the ones documented on the dataset's website.
On there, they described the order of the four values for each box as follows:
xMin, xMax, yMin, yMax.
However, the order for the TF Datasets API is yMin, xMin, yMax, xMax. I found this out by comparing the image IDs from a single image with the annotations.csv file from the website. The only step left to get the absolute value for the boxes is to multiply the x values with the width of the image and the y values with its height.

How to make sense of Open Images Dataset's bounding-box annotations?

1 Answers