1
votes

How can I make Weka classify the smaller classification? I have a data set where the positive classification is 35% of the data set and the negative classification is 65% of the data set. I want Weka to predict the positive classification but in some cases, the resultant model predicts all instances to be the negative classification. Regardless, it is classifying the negative (larger) class. How can I force it to classify the positive (smaller) classification?

3
This is called a 2:1 class imbalance. You might get better answers on the sister site CrossValidated for statistics.smci
Which specific classifier? Weka seems to have at least 50smci

3 Answers

0
votes

One simple solution is to adjust your training set to be more balanced (50% positive, 50% negative) to encourage classification for both cases. I would guess that more of your cases are negative in the problem space, and therefore you would need to find some way to ensure that the negative cases still represent the problem well.

Since the ratio of positive to negative is 1:2, you could also try duplicating the positive cases in the training set to make it 2:2 and see how that goes.

0
votes

Use stratified sampling (e.g. train on a 50%/50% sample) or class weights/class priors. It helps greatly if you tell us which specific classifier? Weka seems to have at least 50.

Is the penalty for Type I errors = penalty for Type II errors? This is a special case of the receiver operating curve (ROC). If the penalties are not equal, experiment with the cutoff value and the AUC.

You probably also want to read the sister site CrossValidated for statistics.

0
votes

Use CostSensitiveClassifier, which is available under "meta" classifiers enter image description here

You will need to change "classifier" to your J48 and (!) change cost matrix to be like [(0,1), (2,0)]. This will tell J48 that misclassification of a positive instance is twice more costly than misclassification of a negative instance. Of course, you adjust your cost matrix according to your business values.