Let's say we have n images of cat and dog separately and we trained an image classification model to classify a new image with a probability score saying whether it's a cat or a dog.
Now, we are getting images that contains multiple cats and dogs in same image, how can we detect and localize objects(cats and dogs here)?
If it is possible, can we also depict the focus areas considered by model for prediction so that a bounding box could be drawn?
cv
or any other tool box to extract pixels from the image and form sub-images. Pass these sub-images to the next network and until all the sub-images are fed to the second network, put all the predictions into one array. Finally annotate on the original image using one-one correspondence of the image and the bounding box location. Good luck. Also mention if I can write this as an answer. – learner