4
votes

I am trying to calculate dense feature trajectories of a video as in https://hal.inria.fr/hal-00725627/document. I am trying to use openCV hog descriptors like this:

winSize = (32,32)
blockSize = (32,32)
blockStride = (2,2)
cellSize = (2,2)
nbins = 9

hog = cv2.HOGDescriptor(winSize,blockSize,blockStride,cellSize,nbins)
hist = hog.compute(img)

However, this returns a very large feature vector of size: (160563456, 1).

What is a window? (winSize) What is a block? What is a cell? The documentation isn't particularly helpful at explaining what each of these parameters is.

From http://www.learnopencv.com/histogram-of-oriented-gradients/ I see that to compute HOGs we create a histogram for each cell of an image patch and then normalise over the patch.

What I want is 4 9bin histograms for each (32, 32) patch of my image which should be calculated from the histograms of the (16,16) cells from this patch. So I would expect a final hog feature of size 40716 for a (480,640) image.

(((32*32) / (16*16)) * 9) * (((480-16*640-16)/(32*32)*4)) = 40716

((PatchSize / cell size) * numBins) * numPatches = hogSize

I have also seen people doing stuff like this:

winStride = (8,8)
padding = (8,8)
locations = ((10,20),)
hist = hog.compute(image,winStride,padding,locations)

However, I don't understand what the locations parameter does as I do not wish to only compute the HOG features at a single location but for all (32,32) patches of my image.

2
the documentation and the linked document explain window, block and cell. why do you ask what it is? what is not clear about those explainations?Piglet

2 Answers

4
votes
    cell_size = (16, 16)  # h x w in pixels
    block_size = (2, 2)  # h x w in cells
    nbins = 9  # number of orientation bins

    # winSize is the size of the image cropped to an multiple of the cell size
    # cell_size is the size of the cells of the img patch over which to calculate the histograms
    # block_size is the number of cells which fit in the patch
    hog = cv2.HOGDescriptor(_winSize=(img.shape[1] // cell_size[1] * cell_size[1],
                                      img.shape[0] // cell_size[0] * cell_size[0]),
                            _blockSize=(block_size[1] * cell_size[1],
                                        block_size[0] * cell_size[0]),
                            _blockStride=(cell_size[1], cell_size[0]),
                            _cellSize=(cell_size[1], cell_size[0]),
                            _nbins=nbins)


    self.hog = hog.compute(img)
2
votes

We divide the image into cells of mxn pixels. Let's say 8x8. So a 64x64 image would result in 8x8 cells of 8x8 pixels.

To reduce overall brightness effects we add a normalization stop into the feature calculation. A block contains several cells. Instead of normalizing each cell we normalize across a block. A 32x32 pixel block would contain 4x4 8x8 pixel cells.

A window is the part of the image we calculate the feature descriptor for. Let's say you want to find something of 64x64 pixels in a large image. You then would slide a 64x64 pixel window across the image and calculate the feature descriptor for each location which you then use to find the location of best match...

It's all in the documents. Just read it and experiment until you understand it. If you can't follow the documentation, read the source code and see what is going on line by line.