3
votes

I am trying to classify the yard digits on the football field. I am able to detect them (different method) well. I have a minimal bounding box drawn around the tens place digits '1,2,3,4,5'. My goal is to classify them.

Ive been trying to train an SVM classifier on hog features I extract from the training set. A small subset of my training digits are here: http://ssadanand.imgur.com/all/

While training, I visualize my hog descriptors and they look correct. I use a 64X128 training window and other default parameters that OPencv's HOGDescriptor uses.

Once I train my images (50 samples per class, 5 classes), I have a 250X3780 training vector and 1X250 label vector which holds the class label values which I feed to a CvSVM object. Here is where I have a problem.

I tried using the default CvSVMParams() while using CvSVM. Terrible performance when tested on the training set itself!

I tried customizing my CvSVMPARAMS doing this:

CvSVMParams params = CvSVMParams();
params.svm_type = CvSVM::EPS_SVR;
params.kernel_type = CvSVM::POLY;
params.C = 1; params.p = 0.5; params.degree = 1;

and different variations of these parameters and my SVM classifier is terribly even when I test on the training set!

Can somebody help me out with parameterizing my SVM for this 5 class classifier? I don't understand which kernel and what svm type I must use for this problem. Also, how in the world am I supposed to find out the values of c, p, degree for my svm?

I would assume this is an extremely easy classification problem since all my objects are nicely bounded in a box, fairly good resolution, and the classes i.e.: the digits 1,2,3,4,5 are fairly unique in appearance. I don't understand why my SVM is doing so poorly. What am I missing here?

1

1 Answers

4
votes

A priori and without experimentation, it's very hard to give you some good parameters but I can give you some ideas.

First, you want to model a multi class classifier but you are using a regression algorithm, not that you can't do that but usually is easier if you start with C-SVM first.

Second, I would recommend to use RBF instead of a Polynomial kernel. Poly is very hard to get it right and usually RBF would do a better job out of the box.

Third, I would play with several values of C, don't be shy and try a bigger C (such as 100) which would force the algorithm to pick more SVs. It can lead to overfitting but if you can't even make the algorithm to learn the training set that's not your immediate problem.

Fourth, I would reduce the dimension of the images at first and then if needed, when you have a more stable model, you could try with that dimension again.

I really recommend you to read LibSVM guide which is very easy to follow http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

Hope it helps!

EDIT:

I forgot to mention, that a good way to pick parameters for SVM is to perform cross-validation: http://en.wikipedia.org/wiki/Cross-validation_(statistics)

http://www.autonlab.org/tutorials/overfit10.pdf

http://www.youtube.com/watch?v=hihuMBCuSlU

http://www.youtube.com/watch?v=m5StqDv-YlM

EDIT2:

I know is silly because it's on the title of the question, but I didn't realize you were using HOG descriptors until you pointed out on the comments.