i am using Weka Gui - Explorer and i want to classify my data according to the class {male, female}. I use the MultiBoostAB classifier with the REPTree classifier as base. I am trying to evaluate the accuracy of my classifier using a training set (557 instances)
and then a test set (200 instances) with about 300 attributes. The accuracy rate is 83,5% - 167 correctly classified instances out of 200 and the kappa statistic is 0,67. I saved this model and I used it to predict the
label (male or female) of other unkonown data getting almost the same good results. Then i increased the size of my training set to 1000 instances to see if i could improve the accuracy rate of my classifier. I got the following results:
- running a test set of 360 instances --> 87.0423 % Correctly Classified Instances and kappa statistic 0,7335
- running a test set of 200 instances --> 59% Correctly Classified Instances and kappa statistic 0,18
(it predicts most of my data as female) Why is my model worse when I increase the size of the training set?