I need to detect large numbers of two classes of objects in a single image. I've had some success using the Tensorflow Object Detection API by retraining the faster_rcnn_inception_resnet_v2_atrous_coco network from the Object Detection Model Zoo using the following config file:
model {
faster_rcnn {
num_classes: 2
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
feature_extractor {
type: 'faster_rcnn_inception_resnet_v2'
first_stage_features_stride: 8
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 8
width_stride: 8
}
}
first_stage_atrous_rate: 2
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 2000
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 17
maxpool_kernel_size: 1
maxpool_stride: 1
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6
max_detections_per_class: 1000
max_total_detections: 1000
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
}
}
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 900000
learning_rate: .00003
}
schedule {
step: 1200000
learning_rate: .000003
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "/path/model.ckpt"
from_detection_checkpoint: true
load_all_detection_checkpoint_vars: true
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "/path/train.record"
}
label_map_path: "/path/label_map.pbtxt"
}
eval_config: {
num_examples: 8000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10
}
eval_input_reader: {
tf_record_input_reader {
input_path: "/path/val.record"
}
label_map_path: "/path/label_map.pbtxt"
shuffle: false
num_readers: 1
}
However, using an Nvidia M10 with 8 GB memory, I'm only able to get detections on (roughly) the top half of the image:
This pattern is consistent across many images, with some images having a few bounding boxes lower down on the image, but no images having bounding boxes accurately distributed throughout the image. My first thought was that it was a memory problem, so I tried running the detection on a GPU with more memory (Nvidia V100 with 32 GB memory). I changed the config file to raise the first_stage_max_proposals from 2000 to 4000 and the max_detections_per_class/max_total_detections from 1000 to 2000 (on the 8 GB GPU these settings led to an Aborted (core dumped) error). The results were only marginally better:
I tried raising the first_stage_max_proposals to 8000 and the max_detections_per_class/max_total_detections to 4000, but this led to an Aborted (core dumped) error on the 32 GB GPU.
My questions are:
1) Are these the best config settings for detecting large numbers of objects in a single image?
2) Is there a better network than faster_rcnn_inception_resnet_v2_atrous_coco for this specific task?
3) Is there an entirely different approach that's better suited to this problem?
I've considered splitting the image up into smaller images and running it on those, but if possible I'd like to keep it as one image, as accurate counts of the objects are important to my application and splitting the objects along some dividing line might lead to inaccurate counts.
Thanks!


max_proposalsandmax_total_detections800. - Vedanshu