1
votes

i am using Weka Gui - Explorer and i want to classify my data according to the class {male, female}. I use the MultiBoostAB classifier with the REPTree classifier as base. I am trying to evaluate the accuracy of my classifier using a training set (557 instances)

and then a test set (200 instances) with about 300 attributes. The accuracy rate is 83,5% - 167 correctly classified instances out of 200 and the kappa statistic is 0,67. I saved this model and I used it to predict the

label (male or female) of other unkonown data getting almost the same good results. Then i increased the size of my training set to 1000 instances to see if i could improve the accuracy rate of my classifier. I got the following results:

  • running a test set of 360 instances --> 87.0423 % Correctly Classified Instances and kappa statistic 0,7335
  • running a test set of 200 instances --> 59% Correctly Classified Instances and kappa statistic 0,18

(it predicts most of my data as female) Why is my model worse when I increase the size of the training set?

1

1 Answers

1
votes

Well, without actually seeing and analyzing your training data, this is really hard to say.

My first guess would be that the additional 443 instances you add to you training set are very different, hence the classifier learn a completely different model.

What happens if you train the model on only those 443 instances? If the accuracy on your test set is even worse, you know that your training data may not be the best to generalize from.