Yes, it is important that all the objects that you want to find - are marked on image from training dataset. You teach to find objects where they are, and not to find objects where none exist.
CNN Yolo try to solve 3 problems:
- to mark by rectangle the objects for which Yolo trained - positive error on last layer
- don't mark one object as another object - negative error on last layer
- don't mark any objects at background - negative error on last layer
I.e. Yolo looking for differences, why the first dog is considered to be an object, and the second considered the background. If you want to find any dogs, but you label only some of them, and labeled dogs are not statistically different from not labeled dogs, then it will be extremely low accuracy of detection. Because abs(positive_error) ~= abs(negative_error) and result of training sum(positive_errors) + sum(negative_errors) ~= 0. It is a contradictory task - you want at the same time: and find a dog, and don't find the dog.
But if labeled dogs are statistically different from not labeled dogs, for example if labeled bulldogs and not labeled labradors, then Yolo-network will been trained to distinguish one from another.
it seems so, because after 3000 batches it didn't detect anything.
It is not enough, Yolo requires 10000 - 40000 iterations.