I am trying Tensorflow faster_rcnn_resnet101 model to detect multiple objects in a 642*481 image. I tried two ways but none got satisfactory results.
1) In this way, I cropped the objects (50*160 rectangle) from training images (training images might be at a different dim size than 642*481 test image). I use those cropped images to train the faster_rcnn_resnet101. Looks like it has good result if the test set is also a cropped image on the same size. But for the 642*481 test image, it could not detect multiple objects there with good results.
Then I think about maybe the model rescaled the test image to match the 50*160 so the details got lost. In this thought, I tried another way
2) I copied each cropped images into a 642*481 white background respectively (padding basically). So each training image has the same dim size as the test image. The location of the cropped images copied on the background is purposely set as random. However, it still does not have good detection on the test image. To try its performance, I use GIMP to make windows that contain objects and replace other parts outside the windows with white pixels. The result is much better. If we only keep one object window in the image and make other parts as white, the result is super.
So my question is what is happening behind those? How could I make it work to detect multiple objects in the test images successfully? Thanks.