1
votes

I am working on a convolution neural network to identify objects like animals, vehicles, trees etc. One of the class for detection is auto. when I gave the image to the network , it predicted as auto . But I need to draw a bounding box around the object. When I tried the sliding window fashion, I got a lot of bounding boxes, but I need only one. How to find out the most appropriate bounding box of an object after neural network prediction?? Don't we need some method to localise the objects from a large image? That is what I want.

My final layer function is a logistic regression function, where it predicts only 1 or 0. I don't know how to make that prediction to a probability score. If I had a probability score of each box, then it was so easy to find out the most appropriate box. Please suggest me some methods for finding the same. Thanks in advance. All answers are welcome.

INPUT, OUTPUT AND EXPECTED OUTPUT

1
Is there just one auto or several in an input image?Justas
Now, I have only one vehicle, but I need to deal with multiple objects also later .Arun Sooraj

1 Answers

2
votes

It's not clear if you have a single object in your input image or several. Your example shows one.

If you have ONE object, here are some options to consider for the bounding boxes:

  • Keep most distant ones: Keep the top, bottom, right, left boundaries that are most distant from the center of all the boundary boxes.
  • Keep the average ones: E.g. Take all the top boundaries and keep their average location. Repeat the same with all the bottom, right, and left boundaries.
  • Keep the median ones: Same as the average, but keep the median of each direction boundary instead.
  • Keep the boundary box with largest activation: You're using logistic regression as the final step, find the input that goes into that logistic layer, and keep the bounding box that has the largest input to the logistic layer.