3
votes

I'm using the Tensorflow Object Detection API to create a custom object detector. I'm using the COCO trained models for transfer learning.

I trained it using Faster Rcnn Resnet and got very accurate results, but the inference speed of this model is very slow. I tried training it with SSD mobilenet V2, which has very fast speed, but I'm getting very low accuracy with this model. Is there anything I can change in the config file to increase the accuracy of the model? Or will the SSD model not give very accurate results since it's a lightweight model? Here's the config file I'm using right now. (I trained it using ~150 images and for 10000 steps)

  ssd {
    num_classes: 1
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
        reduce_boxes_in_lowest_layer: true
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 900
        width: 400
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 3
        box_code_size: 4
        apply_sigmoid_to_scores: false
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            truncated_normal_initializer {
              stddev: 0.03
              mean: 0.0
            }
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_inception_v2'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.9997,
          epsilon: 0.001,
        }
      }
      override_base_feature_extractor_hyperparams: true
    }
    loss {
      classification_loss {
        weighted_sigmoid {
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 0
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 12
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.004
          decay_steps: 800720
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "/content/models/research/pretrained_model/model.ckpt"
  from_detection_checkpoint: true
  num_steps: 10000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}```
3
You can consider increasing the num_steps and give it a try. But, I have tried this exact thing too. My results concluded that Faster RCNN was way ahead in terms of accuracy (it was build keeping accuracy in mind) but SSD models were faster with less accuracy.Pramesh Bajracharya

3 Answers

2
votes

It is very difficult to get high accuracy from a model that was designed to run on mobile phones.

My suggestion is to use the high accuracy model and improve the inference time. Convert the model to TensorRT.

https://github.com/tensorflow/tensorrt/tree/master/tftrt/examples/object_detection

1
votes

You can increase the number of steps :

num_steps : 2000000

And then if the loss is at around 1 or 2 and still the prediction outcomes are not satisfying then nothing can be done. You can try some other model. You could also refer to the COCO trained datasets and chose one with higher COCO mAP[^1] and lesser Speed (ms).

You can try different models and see what works best for your application.

If still, the problem persists you could try increasing the number of training images

0
votes

There are so many places that you can improve.

Typically, you want to use a small input size for SSD, e.g. 320x320, which should at least 3x faster than your current input size 900x400 looks strange.

In addition, you only have 1 foreground class. You typically want to double check on the required anchors and min_size/max_size, all of which are related to prior-box used in SSD. I am pretty sure that the default config, which is for ms-coco, does not fit well in many tasks. For example, if it is a car plate detection task, the plate width is much greater than the height, and thus you can safe drop those aspect_ratios <= 1.

In addition, min_size and max_size are also important. If you use the default settings, you will have anchor boxes with size even bigger than your input image size, is this something you expect? If not, you want to adjust the settings too.

Furthermore, you want to dive deep to see what data augmentation fits your problem best. Recently, auto augmentation is also added.

Finally, you can always boost your performance by using new losses, e.g. focal loss for classification.