I am building a RCNN detection network using Tensorflow's object detection API.
My goal is to detect bounding boxes for animals in outdoor videos. Most frames do not have animals and are just of dynamic backgrounds.
Most tutorials focus on training custom labels, but make no mention of negative training samples. How do these class of detectors deal with images which do not contain objects of interest? Does it just output a low probability, or will it force to try to draw a bounding box within an image?
My current plan is to use traditional background subtraction in opencv to generate potential frames and pass them to a trained network. Should I also include a class of 'background' bounding boxes as 'negative data'?
The final option would be to use opencv for background subtraction, RCNN to generate bounding boxes, then a classification model of crops to identify animals versus background.