Wrong classification to multiple classes with different fraction of classes

Question

I've encouraged with next problem: I'm trying to classify a lot of text documents.

There are 20 classes: 1 normal, 19 - abnormal. When I use Naïve bayes classification I have the following result: classification works well for 19 classes, but for "normal" class I got many misclassification errors: almost all cases in "normal" category were classified as other (non-normal) category.

There are my questions:

How should I select training set for "normal" class? (Now, I just fit to classifier set of text with "normal" category, with 1/20 proportion).
Can classifier be specified this way: if probability of belonging to some class less then certain threshold then classifier must set up
category for this sample (e.g. normal)?

gdupont gdupont · Accepted Answer · 2014-06-26T03:54:47

I'm not sure to have the full picture but It seems like you have in fact only 2 classes "normal" and "abnormal" which are unbalanced in volume and thus prior.

To answer your first question, in that situation, I would try to over-sampling your normal class for training (passing same "normal" instances multiple times to "fake" bigger volume) and see if it improves your performances.

I don't get your second question.

Wrong classification to multiple classes with different fraction of classes

2 Answers