1
votes

I'm trying to set up an object classification system with OpenCV. When I detect a new object in a scene, I want to know if the new object belongs to a known object class (is it a box, a bottel, something unknown, etc.).

My steps so far:

  • Cutting down the Image to the roi where a new object could appear
  • Calculating keypoints for every Image (cv::SurfFeatureDetector)
  • Calculating descriptors for each keypoint (cv::SurfDescriptorExtractor)
  • Generating a vocabulary using Bag of Words (cv::BOWKMeansTrainer)
  • Calculating Response histograms (cv::BOWImgDescriptorExtractor)
  • Use the Response histograms to train a cv::SVM for every object class
  • Using the same set of images again to test the classification

I know that there is still something wrong with my code since the classification don't work yet.

But I don't really know, where I should use the full image (cutted down to the roi) or when I should extract the new object from the image and use just the object itself.

It's my first step into object recognition/classification and I saw people using both, full Images and extracted objects, but I just don't know when to use what.

I hope womeone can clarify this for me.

1

1 Answers

0
votes

You should not use the same images for both testing and training.

In training, ideally you need to extract a ROI which includes just one dominant object, since the algorithm will assume that the codewords extracted from positive samples are the ones that should be presented in a test image to label it as positive. However, if you have a really big dataset like ImageNet, the algorithm should make a generalization.

In testing, you don't need to extract a ROI, because SIFT/SURF are scale invariant features. However, it's good to have a one dominant object in the test set, as well.

I think you should train 1 classifier for your each object class. This is called one-vs-all classifier.

One little note, if you don't want to worry about this issues and have big dataset. Just go with Convolutional Neural Networks. They have a really good generalization capability and are inherently multi-label thanks to their fully connected last layer.