Need to know the proper configuration settings for the Tensorflow Object Detection API to add a class and do transfer learning
After reading and Retrain Tensorflow Object detection API it is still unclear on how to do transfer learning with the API.
I'm looking for the proper way to add a class to a trained model. For example, the SSD with Mobilenet v1
The methods I've seen using the object detection API involve making the following changes: In the pipeline config file:
- Change num_classes: 90 to num_classes: 1
- Change fine_tune_checkpoint: to "../yourlocalpath/model.ckpt
- Keep from_detection_checkpoint: true
- Change train_input_reader/ input_path: to "../yourtrainimagepath/train.record"
- Change train_input_reader/ label_map_path to "../yourlocalpath/classes.pbtxt"
- Change eval_input_reader / input_path to "../yourtestimagepath/test.rocord"
- Change eval_input_reader / label_map_path to "../yourlocalpath/classes.pbtxt"
Change the file: "../yourlocalpath/classes.pbtxt" to only contain:
item {
id: 1
name: 'some_new_class'
I trained 600 images for 200,000 steps (18 hours) to a loss of 1.5.
I achieved over 90% accuracy on the training data but less than 10% on the evaluation. This was clearly an overfit. My first take was that the model is too complex for a single item. It just memorized the training data. I also noticed that the other 90 original items were no longer found.
I then change the num_classes to 91 and simply added item { id: 91 name: 'some_new_class' } to the original classes.pbtxt file?
My results did not improve much (20%). (This time I stopped training around 100,000 steps but the learning curve pretty much flattened by that point).
For both cases, I chose not to change the "from_detection_checkpoint: true" setting. because "starting from a detection checkpoint will usually result in a faster training job than a classification checkpoint." reference:
What is the proper way to train an object detector to detect all objects (old and new)?
I expect that when I conduct a prediction on an image containing already trained objects in addition to my new object, all are found.
Here are the config files used.
1st one with num_classes: 1
# SSD with Mobilenet v1, configured for Oxford-IIIT Pets Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.
model {
ssd {
num_classes: 1
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
similarity_calculator {
iou_similarity {
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 1
box_code_size: 4
apply_sigmoid_to_scores: false
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
feature_extractor {
type: 'ssd_mobilenet_v1'
min_depth: 16
depth_multiplier: 1.0
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
loss {
classification_loss {
weighted_sigmoid {
localization_loss {
weighted_smooth_l1 {
hard_example_miner {
num_hard_examples: 3000
iou_threshold: 0.99
max_negatives_per_positive: 3
min_negatives_per_image: 0
classification_weight: 1.0
localization_weight: 1.0
normalize_loss_by_num_matches: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
score_converter: SIGMOID
train_config: {
batch_size: 10
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.004
decay_steps: 800720
decay_factor: 0.95
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
fine_tune_checkpoint: "/home/adriansr/HoodML/Datasets/ssd_mobilenet_v1_coco_2018_01_28/model.ckpt"
from_detection_checkpoint: true
load_all_detection_checkpoint_vars: true
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
data_augmentation_options {
ssd_random_crop {
train_input_reader: {
tf_record_input_reader {
input_path: "/home/adriansr/HoodML/Datasets/2016_USATF_Sprint_TrainingDataset/Analyze/train.record"
label_map_path: "/home/adriansr/HoodML/hoodbibod/training/classes.pbtxt"
eval_config: {
metrics_set: "coco_detection_metrics"
num_examples: 1100
eval_input_reader: {
tf_record_input_reader {
input_path: "/home/adriansr/HoodML/Datasets/2016_USATF_Sprint_TrainingDataset/Analyze/test.record"
label_map_path: "/home/adriansr/HoodML/hoodbibod/training/classes.pbtxt"
shuffle: false
num_readers: 1
2nd one with num_classes: 91
# SSD with Mobilenet v1, configured for Oxford-IIIT Pets Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.
model {
ssd {
num_classes: 91
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
similarity_calculator {
iou_similarity {
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 1
box_code_size: 4
apply_sigmoid_to_scores: false
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
feature_extractor {
type: 'ssd_mobilenet_v1'
min_depth: 16
depth_multiplier: 1.0
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
loss {
classification_loss {
weighted_sigmoid {
localization_loss {
weighted_smooth_l1 {
hard_example_miner {
num_hard_examples: 3000
iou_threshold: 0.99
max_negatives_per_positive: 3
min_negatives_per_image: 0
classification_weight: 1.0
localization_weight: 1.0
normalize_loss_by_num_matches: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
score_converter: SIGMOID
train_config: {
batch_size: 10
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.004
decay_steps: 800720
decay_factor: 0.95
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
fine_tune_checkpoint: "/home/adriansr/HoodML/Datasets/ssd_mobilenet_v1_coco_2018_01_28/model.ckpt"
from_detection_checkpoint: true
load_all_detection_checkpoint_vars: true
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
data_augmentation_options {
ssd_random_crop {
train_input_reader: {
tf_record_input_reader {
input_path: "/home/adriansr/HoodML/Datasets/2016_USATF_Sprint_TrainingDataset/Analyze/train.record"
label_map_path: "/home/adriansr/HoodML/hoodbibod/training/mscoco_complete_label_map_with_bib.pbtxt"
eval_config: {
metrics_set: "coco_detection_metrics"
num_examples: 1100
eval_input_reader: {
tf_record_input_reader {
input_path: "/home/adriansr/HoodML/Datasets/2016_USATF_Sprint_TrainingDataset/Analyze/test.record"
label_map_path: "/home/adriansr/HoodML/hoodbibod/training/mscoco_complete_label_map_with_bib.pbtxt"
shuffle: false
num_readers: 1
item {
id: 1
name: 'Bib'
item {
name: "background"
id: 0
display_name: "background"
item {
name: "/m/01g317"
id: 1
display_name: "person"
item {
name: "/m/0199g"
id: 2
display_name: "bicycle"
item {
name: "/m/0k4j"
id: 3
display_name: "car"
item {
name: "/m/04_sv"
id: 4
display_name: "motorcycle"
item {
name: "/m/05czz6l"
id: 5
display_name: "airplane"
item {
name: "/m/01bjv"
id: 6
display_name: "bus"
item {
name: "/m/07jdr"
id: 7
display_name: "train"
item {
name: "/m/07r04"
id: 8
display_name: "truck"
item {
name: "/m/019jd"
id: 9
display_name: "boat"
item {
name: "/m/015qff"
id: 10
display_name: "traffic light"
item {
name: "/m/01pns0"
id: 11
display_name: "fire hydrant"
item {
name: "12"
id: 12
display_name: "12"
item {
name: "/m/02pv19"
id: 13
display_name: "stop sign"
item {
name: "/m/015qbp"
id: 14
display_name: "parking meter"
item {
name: "/m/0cvnqh"
id: 15
display_name: "bench"
item {
name: "/m/015p6"
id: 16
display_name: "bird"
item {
name: "/m/01yrx"
id: 17
display_name: "cat"
item {
name: "/m/0bt9lr"
id: 18
display_name: "dog"
item {
name: "/m/03k3r"
id: 19
display_name: "horse"
item {
name: "/m/07bgp"
id: 20
display_name: "sheep"
item {
name: "/m/01xq0k1"
id: 21
display_name: "cow"
item {
name: "/m/0bwd_0j"
id: 22
display_name: "elephant"
item {
name: "/m/01dws"
id: 23
display_name: "bear"
item {
name: "/m/0898b"
id: 24
display_name: "zebra"
item {
name: "/m/03bk1"
id: 25
display_name: "giraffe"
item {
name: "26"
id: 26
display_name: "26"
item {
name: "/m/01940j"
id: 27
display_name: "backpack"
item {
name: "/m/0hnnb"
id: 28
display_name: "umbrella"
item {
name: "29"
id: 29
display_name: "29"
item {
name: "30"
id: 30
display_name: "30"
item {
name: "/m/080hkjn"
id: 31
display_name: "handbag"
item {
name: "/m/01rkbr"
id: 32
display_name: "tie"
item {
name: "/m/01s55n"
id: 33
display_name: "suitcase"
item {
name: "/m/02wmf"
id: 34
display_name: "frisbee"
item {
name: "/m/071p9"
id: 35
display_name: "skis"
item {
name: "/m/06__v"
id: 36
display_name: "snowboard"
item {
name: "/m/018xm"
id: 37
display_name: "sports ball"
item {
name: "/m/02zt3"
id: 38
display_name: "kite"
item {
name: "/m/03g8mr"
id: 39
display_name: "baseball bat"
item {
name: "/m/03grzl"
id: 40
display_name: "baseball glove"
item {
name: "/m/06_fw"
id: 41
display_name: "skateboard"
item {
name: "/m/019w40"
id: 42
display_name: "surfboard"
item {
name: "/m/0dv9c"
id: 43
display_name: "tennis racket"
item {
name: "/m/04dr76w"
id: 44
display_name: "bottle"
item {
name: "45"
id: 45
display_name: "45"
item {
name: "/m/09tvcd"
id: 46
display_name: "wine glass"
item {
name: "/m/08gqpm"
id: 47
display_name: "cup"
item {
name: "/m/0dt3t"
id: 48
display_name: "fork"
item {
name: "/m/04ctx"
id: 49
display_name: "knife"
item {
name: "/m/0cmx8"
id: 50
display_name: "spoon"
item {
name: "/m/04kkgm"
id: 51
display_name: "bowl"
item {
name: "/m/09qck"
id: 52
display_name: "banana"
item {
name: "/m/014j1m"
id: 53
display_name: "apple"
item {
name: "/m/0l515"
id: 54
display_name: "sandwich"
item {
name: "/m/0cyhj_"
id: 55
display_name: "orange"
item {
name: "/m/0hkxq"
id: 56
display_name: "broccoli"
item {
name: "/m/0fj52s"
id: 57
display_name: "carrot"
item {
name: "/m/01b9xk"
id: 58
display_name: "hot dog"
item {
name: "/m/0663v"
id: 59
display_name: "pizza"
item {
name: "/m/0jy4k"
id: 60
display_name: "donut"
item {
name: "/m/0fszt"
id: 61
display_name: "cake"
item {
name: "/m/01mzpv"
id: 62
display_name: "chair"
item {
name: "/m/02crq1"
id: 63
display_name: "couch"
item {
name: "/m/03fp41"
id: 64
display_name: "potted plant"
item {
name: "/m/03ssj5"
id: 65
display_name: "bed"
item {
name: "66"
id: 66
display_name: "66"
item {
name: "/m/04bcr3"
id: 67
display_name: "dining table"
item {
name: "68"
id: 68
display_name: "68"
item {
name: "69"
id: 69
display_name: "69"
item {
name: "/m/09g1w"
id: 70
display_name: "toilet"
item {
name: "71"
id: 71
display_name: "71"
item {
name: "/m/07c52"
id: 72
display_name: "tv"
item {
name: "/m/01c648"
id: 73
display_name: "laptop"
item {
name: "/m/020lf"
id: 74
display_name: "mouse"
item {
name: "/m/0qjjc"
id: 75
display_name: "remote"
item {
name: "/m/01m2v"
id: 76
display_name: "keyboard"
item {
name: "/m/050k8"
id: 77
display_name: "cell phone"
item {
name: "/m/0fx9l"
id: 78
display_name: "microwave"
item {
name: "/m/029bxz"
id: 79
display_name: "oven"
item {
name: "/m/01k6s3"
id: 80
display_name: "toaster"
item {
name: "/m/0130jx"
id: 81
display_name: "sink"
item {
name: "/m/040b_t"
id: 82
display_name: "refrigerator"
item {
name: "83"
id: 83
display_name: "83"
item {
name: "/m/0bt_c3"
id: 84
display_name: "book"
item {
name: "/m/01x3z"
id: 85
display_name: "clock"
item {
name: "/m/02s195"
id: 86
display_name: "vase"
item {
name: "/m/01lsmm"
id: 87
display_name: "scissors"
item {
name: "/m/0kmg4"
id: 88
display_name: "teddy bear"
item {
name: "/m/03wvsk"
id: 89
display_name: "hair drier"
item {
name: "/m/012xff"
id: 90
display_name: "toothbrush"
item {
name: "/m/bib"
id: 91
display_name: "bib"
, thenum_examples
should be set equal to the size of your validation dataset size. – danyfang