2
votes

I understood that we need selective search as an external algorithm for generating region of interest proposals in R-CNN, but in Fast R-CNN we can simply take in the entire image, and then passes it to the convolutional network to create a feature map, and then used a single layer of SPP (RoI pooling layer).

On another hand, we used multi-layer SPP in SPP-net. For quick reference & understanding enter image description here

In both slow R-CNN, SPP-net & Fast R-CNN the region of interest(RoIs) was from a proposal method ("selective search", ?? ,?? respectively).

Could anyone explain in detail & cite it what proposal methods explicitly used in the SPP-net & Fast R-CN since, I didn't find it mentioned clearly in the research papers in details?

1

1 Answers

2
votes

The official github repo showed both SPP-net and Fast R-CNN used the same region proposal method as R-CNN, namely 'selective search':

SPP_net and Fast R-CNN. In SPP_net repo, there is a selective search module for computing region proposals, in fast r-cnn repo, the author specifically mentioned the method for computing object proposals is selective search.

But again, generating region proposals can also use other methods, since R-CNN and Fast R-CNN adopted object proposal methods as external modules independent of the detectors.

Generally speaking, if a method generates more proposals, it can benefit the final detection accuracy but this of course would limit the detection speed. In the Faster R-CNN paper section 2 'Related Work', there is a nice summary of all object proposals generating method.

For the follow up question, namely how to intuitively picture region proposals in the feature map, it can be better illustrated in the following picture (ref): image_ref

In the picture, the red box on the left after convolutional opereation will become the red square in the output volume on the right, and the green box corresponds to the green square, etc. Now imagine the whole 7x7 on the left is the region proposal, then on the output feature map, it is still a region proposal! Of course in reality the image on the left has much more pixels, so there could be many region proposals, and each of these proposals will still look like a region proposal on the output feature map!

Finally in the original SPP_net paper, the author expalins how exactly they performed the transformation of region proposals from the original image to the candidate windows on the feature map. enter image description here