0
votes

I created a sagemaker object detection training job and the subsequent endpoint. I have just two classes. However, when I try to use the endpoint for prediction I get multiple rows in the response, like so:

{"prediction": [
[1.0, 0.632090151309967, 0.0, 0.003549516201019287, 1.0, 1.0], 
[0.0, 0.4135304093360901, 0.0, 0.006693154573440552, 1.0, 0.9729366302490234], 
[0.0, 0.018929673358798027, 0.0, 0.044431887567043304, 0.6495294570922852, 0.23290297389030457], 
[0.0, 0.01802791841328144, 0.0, 0.11557215452194214, 0.6625108122825623, 0.7412691712379456], 
[0.0, 0.015324527397751808, 0.0, 0.267954021692276, 0.6784608960151672, 0.39592066407203674], 
[0.0, 0.013910820707678795, 0.0, 0.8590829372406006, 0.7399784326553345, 1.0], 
[1.0, 0.013243389315903187, 0.928236186504364, 0.0, 1.0, 0.07348344475030899], 
[0.0, 0.012794392183423042, 0.9662157893180847, 0.0, 1.0, 0.057823698967695236], 
[0.0, 0.011968772858381271, 0.0, 0.9265779256820679, 0.04517384618520737, 1.0], 
[0.0, 0.011287822388112545, 0.953392744064331, 0.9526442885398865, 1.0, 1.0], 
[1.0, 0.01005781814455986, 0.8989022970199585, 0.9481537342071533, 1.0, 1.0]
]
}

Why are there multiple rows in the response?

1
Without more of your code I'm only speculating, but this looks like it's returning multi-object classification and not object categorization. Is that possible? - Matthew Arthur
Since I am using the AWS sagemaker console, I do not have much code. I just use the train, validation and annotation channels to provide the input and create the training job and the subsequent model and endpoint. Since the prediction result does show the class labels, I am thinking there might be some categorization. Since this is the first time I am using object detection in sagemaker, I want to know how does a typical prediction look like for a two class and then two objects in an image scenario. Thanks. - Droy

1 Answers

2
votes

Each row corresponds to a detected object. Quoting from https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection-in-formats.html:

Each row in this .json file contains an array that represents a detected object. Each of these object arrays consists of a list of six numbers. The first number is the predicted class label. The second number is the associated confidence score for the detection. The last four numbers represent the bounding box coordinates [xmin, ymin, xmax, ymax].