In Fast RCNN, I understand that you first apply a CNN to the image in order to get a feature map. Then, you use the ROIs generated an external object detector (selectivesearch) to get the bounding box of potential objects of interests. However, I don't understand how you get the features from the feature map associated with the region of interest.
Ex. Apply Selectivesearch and I get a list of (x,y,width,height). Then, I apply a CNN(inceptionv3) to get a 2048x1 feature vector(from pool3 layer). How do I get the regions of interest from my feature vector of the image or am I interpreting this method incorrectly
Thanks for your help!