I'm implementing a Bag-of-Words image classifier using OpenCV. Initially I've tested SURF descriptors extracted in SURF keypoints. I've heard that Dense SIFT (or PHOW) descriptors can work better for my purposes, so I tried them too.
To my surprise, they performed significantly worse, actually almost 10 times worse. What could I be doing wrong? I'm using DenseFeatureDetector from OpenCV to get keypoints. I'm extracting about 5000 descriptors per image from 9 layers and cluster them into 500 clusters.
Should I try PHOW descriptors from VLFeat library? Also I can't use chi square kernel in OpenCV's SVM implementation, which is recommended in many papers. Is this crucial to the classifier quality, should I try another library?
Another question is the scale invariance, I suspect that it can be affected by dense feature extraction. Am I right?