Tensorflow Object detection model evaluation on Test Dataset

Question

I have fine-tuned a faster_rcnn_resnet101 model available on the Model Zoo to detect my custom objects. I had the data split into train and eval set, and I used them in the config file while training. Now after training has completed, I want to test my model on an unseen data (I call it the test data). I used a couple of functions but can not figure out for certain which code to use from the tensorflow's API to evaluate the performance on the test dataset. Below are the things that I tried:

I used the object_detection/metrics/offline_eval_map_corloc.py function to get evaluation on test dataset. The code runs fine but I negative values or AR and AP for large and medium sized bounding boxes.

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.459

Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.601

Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.543

Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.459

Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000

Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000

Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.543

Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.627

Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.628

Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.628

Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000

Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000

Now, I know that mAP and AR can't be negative and there is something wrong. I want to know why do I see negative values when I run the offline evaluation on the test dataset?

The query that I used to run this pipeline is: SPLIT=test

echo "
label_map_path: '/training_demo/annotations/label_map.pbtxt'
tf_record_input_reader: { input_path: '/training_demo/Predictions/test.record' }
" > /training_demo/${SPLIT}_eval_metrics/${SPLIT}_input_config.pbtxt

echo "
metrics_set: 'coco_detection_metrics'
" > /training_demo/${SPLIT}_eval_metrics/${SPLIT}_eval_config.pbtxt 

python object_detection/metrics/offline_eval_map_corloc.py \
  --eval_dir='/training_demo/test_eval_metrics' \
  --eval_config_path='training_demo/test_eval_metrics/test_eval_config.pbtxt' \
  --input_config_path='/training_demo/test_eval_metrics/test_input_config.pbtxt'

I also tried the object_detection/legacy/eval.py but I get values for evaluation metrics as negative:

DetectionBoxes_Recall/AR@100 (medium): -1.0 DetectionBoxes_Recall/AR@100 (small): -1.0 DetectionBoxes_Precision/[email protected]: -1.0 DetectionBoxes_Precision/mAP (medium): -1.0 etc.

I used the pipeline, python eval.py \ --logtostderr \ --checkpoint_dir=trained-inference-graphs/output_inference_graph/ \ --eval_dir=test_eval_metrics \ --pipeline_config_path=training/faster_rcnn_resnet101_coco-Copy1.config

The eval_input_reader in the faster_rcnn_resnet101_coco-Copy1.config pointing to the test TFRecord with ground truth and detection information.

I did also try the object_detection/utils/object_detection_evaluation to get the evaluation. This is nothing different than using the 1st approach because it useless the same base functions - evaluator.evaluate()

I would appreciate any help on this.

With a couple of unit tests and investigations points to the use of wrong category mapping (label map) in the data. For example, if the label map does not contain a class 4 but due to error in the data there is a class 4 in the ground truth then the values of metrics will be -1.0. — Manish Rai

danyfang danyfang · Accepted Answer · 2019-03-18T16:35:45

The evalution metrics is of COCO format so you can refer to COCO API for the meaning of these values.

As specified in coco api code, -1 is the default value if the category is absent. In your case, all objects detected only belong to 'small' area. Also area categories of 'small', 'medium' and 'large' depend on the pixels the area takes as specified here.

Tensorflow Object detection model evaluation on Test Dataset

3 Answers