LibSVM - Multi class classification with unbalanced data

Question

I tried to play with libsvm and 3D descriptors in order to perform object recognition. So far I have 7 categories of objects and for each category I have its number of objects (and its pourcentage) :

Category 1. 492 (14%)

Category 2. 574 (16%)

Category 3. 738 (21%)

Category4. 164 (5%)

Category5. 369 (10%)

Category6. 123 (3%)

Category7. 1025 (30%)

So I have in total 3585 objects.

I have followed the practical guide of libsvm. Here for reminder :

A. Scaling the training and the testing B. Cross validation C. Training D. Testing

I separated my data into training and testing. By doing a 5 cross validation process, I was able to determine the good C and Gamma.

However I obtained poor results (CV is about 30-40 and my accuracy is about 50%).

Then, I was thinking about my data and saw that I have some unbalanced data (categories 4 and 6 for example). I discovered that on libSVM there is an option about weight. That's why I would like now to set up the good weights.

So far I'm doing this :

svm-train -c cValue -g gValue -w1 1 -w2 1 -w3 1 -w4 2 -w5 1 -w6 2 -w7 1

However the results is the same. I'm sure that It's not the good way to do it and that's why I ask you some helps. I saw some topics on the subject but they were related to binary classification and not multiclass classification. I know that libSVM is doing "one against one" (so a binary classifier) but I don't know to handle that when I have multiple class.

Could you please help me ?

Thank you in advance for your help.

Runhao Lu Runhao Lu · Accepted Answer · 2017-03-06T00:40:22

I've met the same problem before. I also tried to give them different weight, which didn't work.

I recommend you to train with a subset of the dataset.

Try to use approximately equal number of different class samples. You can use all category 4 and 6 samples, and then pick up about 150 samples for every other categories.

I used this method and the accuracy did improve. Hope this will help you!

LibSVM - Multi class classification with unbalanced data

1 Answers