I am trying to calculate dense feature trajectories of a video as in https://hal.inria.fr/hal-00725627/document. I am trying to use openCV hog descriptors like this:
winSize = (32,32)
blockSize = (32,32)
blockStride = (2,2)
cellSize = (2,2)
nbins = 9
hog = cv2.HOGDescriptor(winSize,blockSize,blockStride,cellSize,nbins)
hist = hog.compute(img)
However, this returns a very large feature vector of size: (160563456, 1).
What is a window? (winSize) What is a block? What is a cell? The documentation isn't particularly helpful at explaining what each of these parameters is.
From http://www.learnopencv.com/histogram-of-oriented-gradients/ I see that to compute HOGs we create a histogram for each cell of an image patch and then normalise over the patch.
What I want is 4 9bin histograms for each (32, 32) patch of my image which should be calculated from the histograms of the (16,16) cells from this patch. So I would expect a final hog feature of size 40716 for a (480,640) image.
(((32*32) / (16*16)) * 9) * (((480-16*640-16)/(32*32)*4)) = 40716
((PatchSize / cell size) * numBins) * numPatches = hogSize
I have also seen people doing stuff like this:
winStride = (8,8)
padding = (8,8)
locations = ((10,20),)
hist = hog.compute(image,winStride,padding,locations)
However, I don't understand what the locations parameter does as I do not wish to only compute the HOG features at a single location but for all (32,32) patches of my image.