Faster-RCNN bbox/image normalization

Question

I am playing with py-faster-rcnn on a custom dataset (about 3000 images, 7 different classes, including the background), and following these tutorials:

https://github.com/zeyuanxy/fast-rcnn/blob/master/help/train/README.md (Fast-RCNN tutorial) https://github.com/deboc/py-faster-rcnn/tree/master/help (Faster-RCNN tutorial)

I am using the end2end solution with VGG16 network. Everything works fine, expect my results so I have some questions:

What kind of normalizations are needed on the images and on the bbox annotations?
It is similar to the previous question: There are two config options: BBOX_NORMALIZE_TARGETS and BBOX_NORMALIZE_TARGETS_PRECOMPUTED. Should I calculate the mean and std before the training and use these options for bbox normalization?
I modified the num_output at the cls_score and bbox_pred layers (according to this thread: https://github.com/rbgirshick/py-faster-rcnn/issues/1), but in the end2end solution there are rpn_cls_score and rpn_bbox_pred layers too. Should I modify the num_outputs of these too? If I should then how could I calculate the number of outputs for 7 classes?

Bharat Bharat · Accepted Answer · 2016-12-30T19:34:39

No, you do not need to pre-compute anything. In lib/roi_data_layer/roidb.py, it computes the mean and standard deviation for your dataset if you set the BBOX_NORMALIZE_TARGETS_PRECOMPUTED to False, otherwise, it will use the default values which are specified in lib/fast_rcnn/config.py. RPN is agnostic to number of classes. It only treats regions which contain any object as positive and everything else as negative.

Faster-RCNN bbox/image normalization

1 Answers