Yes, in the tutorial the variable output_dict
can be used to achieve that. Notice all the variables passed into function vis_util.visualize_boxes_and_labels_on_image_array
, they contain the boxes, scores, etc.
First you need to get the image shape as the box coordinates are in normalized form.
img_height, img_width, img_channel = image_np.shape
Then transform all the box coordinates to the absolute format
absolute_coord = []
THRESHOLD = 0.7 # adjust your threshold here
N = len(output_dict['detection_boxes'])
for i in range(N):
if output_dict['score'][i] < THRESHOLD:
continue
box = output_dict['detection_boxes']
ymin, xmin, ymax, xmax = box
x_up = int(xmin*img_width)
y_up = int(ymin*img_height)
x_down = int(xmax*img_width)
y_down = int(ymax*img_height)
absolute_coord.append((x_up,y_up,x_down,y_down))
Then you can use numpy slices to get the image area within the bounding box
bounding_box_img = []
for c in absolute_coord:
bounding_box_img.append(image_np[c[1]:c[3], c[0]:c[2],:])
Then just save all the numpy arrays in bounding_box_img
as images. When saving you might need to change the shape as the img is in shape [img_height, img_width, img_channel]. Also you can even filter out all detections with low confidence scores if you use the score array.
PS: i might have messed up with img_height
and img_width
but these should give you a starting point.