I've been playing around with Fast-RCNN for a while, but still can't get some of the core mechanisms.
In the tutorial slides (page 28 of http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf), they have an example output with only one bounding box per object:
http://s22.postimg.org/7rbu05xbl/Screen_Shot_2015_12_04_at_2_19_57_PM.png
Specifically, non-maximum suppression is performed on all region proposals(https://github.com/rbgirshick/fast-rcnn/blob/master/lib/fast_rcnn/test.py#L324), but in my case it still contains tens of regions for each object in the image.
My bounding boxes look like the following with threshold of 0.99:
http://s29.postimg.org/oc33ujgrb/foo.jpg
How and where are the bounding boxes for overlapping region finalized into one?