bag of visual words encoding process

Question

In image classification, I have to encode images in bag of visual words representation and train SVM classifier. What is the process of making bag of visual words encoding of an image?

Saeed Saeed · Accepted Answer · 2015-10-17T20:16:30

Here is the procedure:

First you have to construct a dictionary

First you must apply a sampling (dense/key-point) on the training images. Simple decompose the image into equally-sampled patches.
Repeat the previous step for all your training images. Then, for each path, compute the SIFT descriptor which leads to a 128-D vector.
Performing the step above for all patches of all images leads to a bank of 128-D feature vectors. Cluster these descriptors into K clusters and save their centers. These centers form the visual dictionary of your model.

After constructing the Visual Dictionary

Apply the sampling (dense/key-point) on the target images.
Compute the SIFT feature descriptor for each patch of the query image.
Check in which cluster any patch lie. Select the centers (visual word) of those clusters as representative for that patch.
Compute a histogram on the number of each specific visual word in your target image. This histogram is the descriptor/ representation for your image.

Doing so for all your training set, you can train any off-shelf classifier to classify images.

Here is the visualization of the pipeline:

bag of visual words encoding process

1 Answers