High Validation Loss (with train dataset) while Train Loss is low using Tensorflow Object Detection API

Question

When finetunning Faster RCNN model using model_main.py script, I deliberately set evaluation dataset to be the same as training dataset (TF_DATA) and expect to see the same loss in evaluation as in training. However, evaluation losses (after 4000 epochs):

Loss/BoxClassifierLoss/classification_loss = 20588.025
Loss/BoxClassifierLoss/localization_loss = 9474.761
Loss/RPNLoss/localization_loss = 0.10792526
Loss/RPNLoss/objectness_loss = 0.4256882
Loss/total_loss = 30063.021
loss = 30063.021

While the training total loss is:

I0804 14:01:57.539440 139956088792960 basic_session_run_hooks.py:260] loss = 0.27122372, step = 4200

Constants:

RESIZE_SHAPE = (300, 300)
EVALUATE_EVERY = 10000
EPOCHS = 100000

NMS_SCORE_THRESHOLD = 0.1
IOU_THRESHOLD = 0.7
IOU_THRESHOLD2 = 0.6
NMS_SCORE_THRESHOLD2 = 0.01
LR_INIT = 0.0001
BATCH_SIZE = 1
AUGMENTATIONS = ''''''

My config file:

model {
  faster_rcnn {
    num_classes: 1
    image_resizer {
      fixed_shape_resizer {
        height: '''+str(RESIZE_SHAPE[0])+'''
        width: '''+str(RESIZE_SHAPE[1])+'''
      }
    }
    feature_extractor {
      type: 'faster_rcnn_resnet101'
      first_stage_features_stride: 16
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 16
        width_stride: 16
      }
    }
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: '''+str(NMS_SCORE_THRESHOLD)+'''
    first_stage_nms_iou_threshold: '''+str(IOU_THRESHOLD)+'''
    first_stage_max_proposals: 300
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 14
    maxpool_kernel_size: 2
    maxpool_stride: 2
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: true
        dropout_keep_probability: 0.5
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: '''+str(NMS_SCORE_THRESHOLD2)+'''
        iou_threshold: '''+str(IOU_THRESHOLD2)+'''
        max_detections_per_class: 100
        max_total_detections: 300
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
  }
}

train_config: {
  batch_size: '''+str(BATCH_SIZE)+'''
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: '''+str(LR_INIT)+'''
          schedule {
            step: 900000
            learning_rate: '''+str(LR_INIT)+'''
          }
          schedule {
            step: 1200000
            learning_rate: '''+str(LR_INIT)+'''
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "'''+MODEL_TO_USE+'''/model.ckpt"
  from_detection_checkpoint: true
  load_all_detection_checkpoint_vars: false

  '''+AUGMENTATIONS+'''
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "'''+TF_DATA+'''" 
  }
  label_map_path: "'''+CLASS_LABELS+'''"
  shuffle: true 
}

eval_config: {
  num_examples: '''+str(len(test_dataset))+'''
  max_evals: '''+str(EPOCHS // EVALUATE_EVERY)+'''
  min_score_threshold: '''+str(NMS_SCORE_THRESHOLD2)+'''
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "'''+TF_DATA+'''" 
  }
  label_map_path: "'''+CLASS_LABELS+'''" 
}

Why total losses are different for training and evaluation steps using the same data?

When I use only legacy/train.py script, I already see sensible bounding boxes after 1000 epochs

Hi, it would be helpful if you could share your code or a reproducible example. At least the functions used for training and evaluation are necessary to produce an answer. — Johannes Ackermann
Need reproducible code; can't reproduce, can't debug. Ideally, I copy-paste the code and see what you're seeing. — OverLordGoldDragon

JoOkuma JoOkuma · Accepted Answer · 2020-08-25T15:45:07

Your question is not reproducible so it is hard to find the root cause of your problems.

Despite this, you should be aware of some parts of the network that have different behaviors during train and testing (validation).

The first would be the dropout that only happens during training; however, this should not produce worse results.

The second and most critical is the batch normalization, at least in PyTorch, it uses the current batch statistics to compute and update the running values of the batch norm. In contrast, in testing, it uses the accumulated statistics during training, so it does happen to produce different results during training and testing, especially if the batch size is very small.

High Validation Loss (with train dataset) while Train Loss is low using Tensorflow Object Detection API

1 Answers