Analyze Data Set on WEKA

Question

I'm new to WEKA and I would ask you if anyone can help me to understand if i'm using WEKA correctly.

1) I have a Dat set including 11377 record classified as follows:

11111 records have class YES
266 records have class NO

(For some reason, i can use only J48 algorithm for classification) When I select the J48 algorithm for the classification, the model classify the data with class "NO" incorrectly because the class distribuition is unbalanced. What can I do to solve this problem in the correct way?

2) After having balanced the classes I have to divide the data set into test set and training set, but what is the best/right filter on WEKA to do this ask?

3) When the data have passed the pre-processing phase, once selected the J48 algorithm in the Classify form, what should I test? training or test set? How many times do I have to repeat the tests?

Thanks in advance!

zbicyclist zbicyclist · Accepted Answer · 2018-01-04T05:42:21

Here's one approach. In the Preprocess tab, use the ClassBalancer filter (under Supervised Instance). This will apply weights so that your YES and NO will have equal weights.

In the Classify tab, select a percentage split between Training and Test. The default is 66% Training, 34% Test. This will be chosen randomly.

(If you want to see if the results depend on the exact random split, you can run it multiple times with a different random start -- Under the Percentage Split you will see a "More options" button. Click there and you will see the Random Seed is set to the default of 1. Change this to any other positive integer.)

You should be able to select from several algorithms, not just J48. Not sure why that's happening.

Note that once you get the results, these will reflect the weighted instance and you will likely need to do a conversion back (i.e. take that confusion matrix and convert it back to the actual number of YES and NO).

Analyze Data Set on WEKA

1 Answers