3
votes

I'm implementing a Bag-of-Words image classifier using OpenCV. Initially I've tested SURF descriptors extracted in SURF keypoints. I've heard that Dense SIFT (or PHOW) descriptors can work better for my purposes, so I tried them too.

To my surprise, they performed significantly worse, actually almost 10 times worse. What could I be doing wrong? I'm using DenseFeatureDetector from OpenCV to get keypoints. I'm extracting about 5000 descriptors per image from 9 layers and cluster them into 500 clusters.

Should I try PHOW descriptors from VLFeat library? Also I can't use chi square kernel in OpenCV's SVM implementation, which is recommended in many papers. Is this crucial to the classifier quality, should I try another library?

Another question is the scale invariance, I suspect that it can be affected by dense feature extraction. Am I right?

1

1 Answers

8
votes

It depends on the problem. You should try different techniques in order to know what is the best technique to use on your problem. Usually using PHOW is very useful when you need to classify any kind of scene. You should know that PHOW is a little bit different than just Dense SIFT. I used vlfeat PHOW a few years ago, and seeing the code, it is just calling dense sift with different sizes, and some smoothing. That could be one clue to be able to be invariant to scale. Also in my experiments I used libsvm, and it resulted that histogram intersection was the best one for me. By default chi-square and histogram intersection kernels are not included in libsvm nor OpenCV SVM (based on libsvm). You are the one to decide if you should try them. I can tell you that RBF kernel achieved near 90% of accuracy, wheter histogram intersection 93%, and chi-square 91%. But those results were in my concrete experiments. You should start on RBF with autotuned params, and see if its enough.

Summarizing it all depends on your concrete experiments. But if you use Dense SIFT, maybe you could try to increase the number of clusters, and calling Dense SIFT with different scales (I recommend you the PHOW way).

EDIT: I was looking at OpenCV DenseSift, and maybe you could start with

m_detector=new DenseFeatureDetector(4, 4, 1.5);

Knowing thath vlfeat PHOW uses [4 6 8 10] as bin sizes.