I am playing with py-faster-rcnn on a custom dataset (about 3000 images, 7 different classes, including the background), and following these tutorials:
https://github.com/zeyuanxy/fast-rcnn/blob/master/help/train/README.md (Fast-RCNN tutorial) https://github.com/deboc/py-faster-rcnn/tree/master/help (Faster-RCNN tutorial)
I am using the end2end solution with VGG16 network. Everything works fine, expect my results so I have some questions:
- What kind of normalizations are needed on the images and on the bbox annotations?
- It is similar to the previous question: There are two config options: BBOX_NORMALIZE_TARGETS and BBOX_NORMALIZE_TARGETS_PRECOMPUTED. Should I calculate the mean and std before the training and use these options for bbox normalization?
- I modified the num_output at the cls_score and bbox_pred layers (according to this thread: https://github.com/rbgirshick/py-faster-rcnn/issues/1), but in the end2end solution there are rpn_cls_score and rpn_bbox_pred layers too. Should I modify the num_outputs of these too? If I should then how could I calculate the number of outputs for 7 classes?