Why faster-rcnn end to end training only makes approximation?

Question

In faster rcnn (https://arxiv.org/abs/1506.01497),

there are two ways to train the network.

one way is jointly training rpn and fast rcnn.

the other way is to train both rpn and fast rcnn in the end-to-end manner.

However, the author said that in the end-to-end training, the result is only approximation to jointly training.

the reason for only approximation is

this solution ignores the derivative w.r.t. the proposal boxes’ coordinates that are also network responses, so is approximate.

However, from the network definition (https://github.com/rbgirshick/py-faster-rcnn/blob/master/models/pascal_voc/VGG16/faster_rcnn_end2end/train.prototxt), the bounding box regression for rpn is updated for each training iteration, so it's not ignored.

so, why it ignores the derivative of proposal boxes coordinates? what does that mean?

I am also curious about this point, did you find a solution? — Collin Zhang

Collin Zhang Collin Zhang · Accepted Answer · 2021-08-10T14:54:24

The slide Training R-CNNs of various velocities talks about this in detail in page 40-45. In short, this is because the derivative of the loss function to the ROI layer is undefined, so a surrogate gradient is used, in that case, this derivative is undefined.

P.S.

Link to ICCV 2015 Tutorial

The Github README page guide me to the slide

Why faster-rcnn end to end training only makes approximation?

1 Answers