This article gives a good description of RoI pooling and how you get the RoI BB equivalent for the feature map from the original label.
https://medium.com/datadriveninvestor/review-on-fast-rcnn-202c9eadd23b
Basically, the goal of RoI pooling is to output a fixed size feature map from an arbitrary size section of the CNN output feature map.
To do this, you have to do RoI projection to translate the RoI BB (x,y,h,w) from the original image to the RoI BB you need in the feature map. This is done by scaling it based on the sub-sampling ratio.
Ex.)
- If your image is 18x18 and your feature map is 3x3 then your sub-sampling ratio is 3/18.
- To get your projected RoI BB, then you multiply that by your original BB values like x' = (3/18)x
Then you just do the pooling on that section of the feature map, with an H×W number of pooling windows with sizes ~h'/H×w'/W where H and W are the height and width of your target output for the pooling layer.
The article gives a much better description and I encourage you to check it out and the original paper!