What exactly is the output of the SURF algorithm and how can I use them for classification (SVM, etc.)?

Question

I am working on a project that tracks humans from aerial videos. One of the algorithms that we will use is SURF. Now I understand that SURF uses interest points, but I'm quite confused with comes after that. How exactly can I use the interest points for classification? I want to identify which detected objects in the video are humans or objects, so of course I need a training set, but what will I use? I've read somewhere that BoW should be used, but are there any other ways of extracting these SURF features? If I read the original SURF paper by Herbert Bay correctly, how the features were extracted, what the output was, and how they were prepared for classification were not mentioned.

I'm really confused. Please help. Thank you!

thats because originally SIFT and SURF werent used for classification. Search for classification and SIFT (since SIFT is more common and used the same way as SURF). Maybe try these links: dsp.stackexchange.com/questions/5979/… and robots.ox.ac.uk/~vgg/share/practical-image-classification.htm — Micka
or try this diploma thesis ;) is.tuebingen.mpg.de/fileadmin/user_upload/files/publications/… — Micka

LovaBill LovaBill · Accepted Answer · 2014-04-28T13:56:47

Let's say you have an image and you divide the image into smaller rectangular areas (called patches). Each patch is a rectangular area (x,y,width,height). Let's say you want to describe the colors inside a patch. Thus, you calculate the histogram in it and the result is a concatenation of numbers (a vector) (eg: [5 11 2 4 5]). This output vector is a description vector (a descriptor). If you use all patches to extract descriptors, the method is called dense sampling. If you say that only some of the patches are important then you use keypoints to specify which are significant and which not.

Keypoints are only points of greater significance than other points in an image. A descriptor is a vector that encodes color/shape/texture information of a small area (patch).

Edit: The output of SURF is a cv::Mat where the first row has 64 values (L2 normalized). You can compare two L2 normalized vectors with the L2-norm (euclidean distance).

Edit2: A classifier is a different story. I suggest you study the tutorial http://docs.opencv.org/doc/tutorials/ml/introduction_to_svm/introduction_to_svm.html, while keeping in mind that every 2D-point for your case is a Descriptor of 64 values.

What exactly is the output of the SURF algorithm and how can I use them for classification (SVM, etc.)?

3 Answers