2
votes

I used tensorflow object detection api for detecting multiple objects in my videos. However, I have been struggling with figuring out as to how to write these resulting object detections to a text/CSV/xml file (basically the bounding box information, the frame number of the images sequence, confidence of the bbox)

I've seen several answers in stackoverflow and github but most of them were either vague or I just could not get the exact answer I'm looking for.

Shown below is the last part of the detection code, I know that the detection_boxes and detection_scores are what I need but I just cannot figure out how to write these to a text file and also write only the final bbox detections which are seen on the images but not ALL detection bounding boxes

for image_path in TEST_IMAGE_PATHS:   
    image = Image.open(image_path) # the array based representation of the image will be used later in order to prepare the result image with boxes and labels on it.
    image_np = load_image_into_numpy_array(image)  # Expand dimensions since the model expects images to have shape: [1, None, None, 3]   
    image_np_expanded = np.expand_dims(image_np, axis=0)  # Actual detection.
    output_dict = run_inference_for_single_image(image_np_expanded, detection_graph) # Visualization of the results of a detection.
    vis_util.visualize_boxes_and_labels_on_image_array(
        image_np,
        output_dict['detection_boxes'],
        output_dict['detection_classes'],
        output_dict['detection_scores'],
        category_index,
        instance_masks=output_dict.get('detection_masks'),
        use_normalized_coordinates=True,
        line_thickness=8)   plt.figure(figsize=IMAGE_SIZE)   
    plt.imshow(image_np)
1

1 Answers

0
votes

You can try the following code

image = Image.open(image_path)
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
image_np = load_image_into_numpy_array(image)  
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)

boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
scores = detection_graph.get_tensor_by_name('detection_scores:0')
classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')    

(boxes, scores, classes, num_detections) = sess.run(
          [boxes, scores, classes, num_detections],
          feed_dict={image_tensor: image_np_expanded})

width = 1024
height = 600
threshold = 0.5

temp = [] # list to store scores greater than 0.5

# iterate through all scores and pick those having a score greater than 0.5

for index,value in enumerate(classes[0]):
    if scores[0,index] > threshold:      
        temp.append(scores[0,index])

# Similarly, classes[0,index] will give you the class of the bounding box detection

# Actual detection.
output_dict = run_inference_for_single_image(image_np, detection_graph)

# For printing the bounding box coordinates
for i,j in zip(output_dict['detection_boxes'],output_dict['detection_scores']):
        if(j>threshold):
            print(i[1]*width,i[0]*height,i[3]*width,i[2]*height)

The above code snippet will provide you with the bounding box coordinates and the detection scores. You can use a minimum threshold to filter unnecessary detections. I hope this helps you out. Also, I could not quite understand what you meant by frame number. Could you please elucidate further on what you actually mean by this.

Please let me know if you face any issues