I am currently reading the paper on 'CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection', it is using the skip-connection to fuse conv3-3, conv4-3 and conv5-3 together, the steps are shown below
Extract the feature maps of the face region (at multiple scales conv3-3, conv4-3, conv5-3) and apply RoI-Pooling to it (i.e. convert to a fixed height and width). L2-normalize each feature map. Concatenate the (RoI-pooled and normalized) feature maps of the face (at multiple scales) with each other (creates one tensor). Apply a 1x1 convolution to the face tensor. Apply two fully connected layers to the face tensor, creating a vector.
I used the caffe and made a prototxt based on faster-RCNN VGG16 , the following parts are added into the original prototxt
# roi pooling the conv3-3 layer and L2 normalize it
layer {
name: "roi_pool3"
type: "ROIPooling"
bottom: "conv3_3"
bottom: "rois"
top: "pool3_roi"
roi_pooling_param {
pooled_w: 7
pooled_h: 7
spatial_scale: 0.25 # 1/4
}
}
layer {
name:"roi_pool3_l2norm"
type:"L2Norm"
bottom: "pool3_roi"
top:"pool3_roi"
}
-------------
# roi pooling the conv4-3 layer and L2 normalize it
layer {
name: "roi_pool4"
type: "ROIPooling"
bottom: "conv4_3"
bottom: "rois"
top: "pool4_roi"
roi_pooling_param {
pooled_w: 7
pooled_h: 7
spatial_scale: 0.125 # 1/8
}
}
layer {
name:"roi_pool4_l2norm"
type:"L2Norm"
bottom: "pool4_roi"
top:"pool4_roi"
}
--------------------------
# roi pooling the conv5-3 layer and L2 normalize it
layer {
name: "roi_pool5"
type: "ROIPooling"
bottom: "conv5_3"
bottom: "rois"
top: "pool5"
roi_pooling_param {
pooled_w: 7
pooled_h: 7
spatial_scale: 0.0625 # 1/16
}
}
layer {
name:"roi_pool5_l2norm"
type:"L2Norm"
bottom: "pool5"
top:"pool5"
}
# concat roi_pool3, roi_pool4, roi_pool5 and apply 1*1 conv
layer {
name:"roi_concat"
type: "Concat"
concat_param {
axis: 1
}
bottom: "pool5"
bottom: "pool4_roi"
bottom: "pool3_roi"
top:"roi_concat"
}
layer {
name:"roi_concat_1*1_conv"
type:"Convolution"
top:"roi_concat_1*1_conv"
bottom:"roi_concat"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 128
pad: 1
kernel_size: 1
weight_filler{
type:"xavier"
}
bias_filler{
type:"constant"
}
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "roi_concat_1*1_conv"
top: "fc6"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 4096
}
}
during the training, I met such a issue
F0616 16:43:02.899025 3712 net.cpp:757] Cannot copy param 0 weights from layer 'fc6'; shape mismatch. Source param shape is 1 1 4096 25088 (102760448); target param shape is 4096 10368 (42467328).
To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer.
I could find out what goes wrong, I need some help from you if you can spot some problem or explanation.
Really appreciated!!