0
votes

I have a oject detection model made using tensorflow object detection api and faster rcnn model. This model is able to detect objects which are clearly visible but fails to detect the objects which are tiny/smaller in size or are at a larger distance.Does anything need to be changed in the faster rcnn config file? If yes, then what is it? And if not then how can this model detect tiny objects? Below is the faster rcnn config file for reference

model {
  faster_rcnn {
    num_classes: 4
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024
      }
    }
    feature_extractor {
      type: 'faster_rcnn_inception_v2'
      first_stage_features_stride: 16
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 16
        width_stride: 16
      }
    }
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 300
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 14
    maxpool_kernel_size: 2
    maxpool_stride: 2
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 300
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
  }
}

train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0002
          schedule {
            step: 3000
            learning_rate: .00002
          }
          schedule {
            step: 15000
            learning_rate: .000002
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "C:/multi_cat_3/models/research/object_detection/faster_rcnn_inception_v2_coco_2018_01_28/model.ckpt"
  from_detection_checkpoint: true
  load_all_detection_checkpoint_vars: true

  num_steps: 20000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
}


train_input_reader: {
  tf_record_input_reader {
    input_path: "C:/multi_cat_3/models/research/object_detection/train.record"
  }
  label_map_path: "C:/multi_cat_3/models/research/object_detection/training/labelmap.pbtxt"
}

eval_config: {
  metrics_set: "coco_detection_metrics"
  num_examples: 1311
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "C:/multi_cat_3/models/research/object_detection/test.record"
  }
  label_map_path: "C:/multi_cat_3/models/research/object_detection/training/labelmap.pbtxt"
  shuffle: false
  num_readers: 1
}
2

2 Answers

0
votes

I guess you need to train the model with similar images you want to test. In your case take images you will be testing and train the model using that dataset. Take the images for the desired height or the distance and label the images as accurate as possible.

I think this will do.

0
votes

COCO metrics defines object sizes as:

small objects: area < 32*2
medium objects: 32*2 < area < 96*2
large objects: area > 96*2

So whichever objects you want to detect, tile/cut the main image to several parts until your object appears larger with reference to an entire image resolution.

Select the model which wont resize your train images lower than desired size (to keep the objects larger with reference to an entire image)

Below config. For e.g. if your original image is 2000x2000, and object is 30x30, if you choose following config, objects will also be resized to lesser than 30x30 because the training images will be resized to 600x1024 before those are passed to model for training.

But if the image can be cut into 4x4 tiles (500x500), below specs will resize it to 600x600 and the object will be larger than 30x30.

image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024

PS: At the time of inference, same cut/tiled images need to be passed through model and may increase inference time.

Another alternative is to select the configs which fits your original image resolution as given in image_resizer function but requires higher specs (CPU/GPU/TPU) for handing the training.

HTH!