1
votes

In image classification, I have to encode images in bag of visual words representation and train SVM classifier. What is the process of making bag of visual words encoding of an image?

1

1 Answers

1
votes

Here is the procedure:

First you have to construct a dictionary

  1. First you must apply a sampling (dense/key-point) on the training images. Simple decompose the image into equally-sampled patches.
  2. Repeat the previous step for all your training images. Then, for each path, compute the SIFT descriptor which leads to a 128-D vector.
  3. Performing the step above for all patches of all images leads to a bank of 128-D feature vectors. Cluster these descriptors into K clusters and save their centers. These centers form the visual dictionary of your model.

After constructing the Visual Dictionary

  1. Apply the sampling (dense/key-point) on the target images.

  2. Compute the SIFT feature descriptor for each patch of the query image.

  3. Check in which cluster any patch lie. Select the centers (visual word) of those clusters as representative for that patch.

  4. Compute a histogram on the number of each specific visual word in your target image. This histogram is the descriptor/ representation for your image.

Doing so for all your training set, you can train any off-shelf classifier to classify images.

Here is the visualization of the pipeline:

enter image description here