2
votes

I am classifying medical images using bag-of-words model. I did the following to extract the feature vector:

  1. extract features from small image patches and then apply BOW on those features
  2. extract pixel values from small image patches then apply BOW on those pixel values

After the feature extraction I tried PCA, feature selection, changing no of clusters for KMeans etc to improve the accuracy. But in my case BOW learned on pixel values (1) outperforms (90%) than the BOW learned on features(2) (70%). My features are good and when I use those features to classify the images using some other framework I was able to get more than 95% accuracy.

My question is why BOW learned on pixels performs better than BOW learned on features?

enter image description hereenter image description here

Normal-abnormal colonoscopy image classification

    Figure 1: a normal colon image
    Figure 2: an image with polyp
1
Could you show some images if possible and tell what categories you are trying to identify? - Maurits
It is a normal vs abnormal colonoscopy image classification. - user570593
Sorry, but I have no idea what that would look like.. - Maurits
Please look at the figures that I uploaded now. - user570593
And what features are you extracting? - Maurits

1 Answers

3
votes

My understanding of your two methods for extracting features from an image patch are

Feature selection = "run PCA, k-means, or select some subset of pixels, and construct a vector of these extracted values"

Pixel Values = "create a vector from RGB values of the image"

In fact, to get good results from BOW features, people often derive individual features using relatively complicated algorithms.

In the project at http://vision.stanford.edu/projects/totalscene/index.html (paper in reference #1), the authors take BOW features from both images blocks and a segmentation. For the image blocks, they extract SIFT features, and for each segment they take shape, color, location, and texture features (see section 2.1 and follow the reference for a better description of the features they use).

In "Decomposing a Scene into Geometric and Semantically Consistent Regions." (Gould et. al.) Shape, color, edge, etc. features are derived by doing things like training boosted logistic regression classifiers, Potts models, and Gaussian Mixture models.

You probably don't need such intensive techniques to extract features that beat pixel vectors, but you should definitely browse around the literature to see what is effective.

SIFT features, color histograms, and filters to extract texture responses seem to work pretty well and also have a reasonable amount of software library support.