Machine Learning Classification and predication in weka

Question

I am very new to machine learning. Sorry if there are any mistakes in my English.

I am using the weka J48 Classification for prediction in true or false. I have almost 999K training set which i used to train the model. I used the cross validation method with 3 folds to train the Model which gives me accuracy of ~84%.

Now after storing the model. i tried to test it on 50k dataset. which is giving very bad results and 50% of them are mismatch. I have 11 attributes with nominal and numeric fields.

I dont know why its happening.

I have two questions.

How can i train to perform better on test set.
what could be possible issues.

I am using weka api in java.

Actually, i am using 30 days of data of training and 1 day of data for testing and predicting. — Pandit
i am getting in CSV file which i am then converting to ARFF. — Pandit

ruoho ruotsi ruoho ruotsi · Accepted Answer · 2015-11-23T01:44:18

It means that your model is overfit for your 999k training set and doesn't generalize well to your 50k testing set.

You should look into cross-validating with (a good portion, but not all) of your 50k dataset in addition to your 999k.

You may also want to try something higher than a k=3, k-fold crossvalidation, because k=3 folds may be too "coarse". Good luck!

Machine Learning Classification and predication in weka

1 Answers