1
votes

Currently I'm using Weka and I'm trying to use the nearest neighbor method to classify my test set. Both my train set and test set have 11 columns with numerical values, the last column being the one to classify. Both have been converted from .csv to .arff with the Weka tool.

preview training set

preview test set

First I uploaded the training set and in the "classify" tab under "test options" I checked "use training set". I selected the "IBk" classifier and put # of neighbors to 10. The (bad) output was this:

training set classify

Next I checked "supplied test set" and uploaded my test set. Only the last column is empty (apart from the header). But when I try to run it,I get the following output saying none were classified:

testing set classify

At this point I just don't understand what to do. As far as I can tell my test and train set are correct, as they are identical apart from the numerical values in the columns and I'm simply trying to use my test set after having trained on the train set... Somewhere I'm doing something terribly wrong.

1
So your classes are not nominal rather they are numeric? And what happens if you put ? in the class attribute's cells of the test file?Rushdi Shams
Hey Rushdi, well in essence they are nominal, but were replaced with numbers (ex. tree type 1 = 1). I changed it to nominal and now I get good results on using the training set, so thanks. But the test file remains the same issue: everything is put in unknown. Empty cells or "?" makes no difference. I did notice however that in the testfile the class attribute was labelled as numeric. I changed the arff file to make it identical to the training set: @attribute Cover_Type {aspen,lodgepole,spruce,krummholz,ponderosa,douglas,willow}user2870593
Cant edit the above anymore, but even after having applied the above class attribute line in the arff, all my instances from the test set still place as "unknown".user2870593
Oh, got it! check out my answer.Rushdi Shams

1 Answers

1
votes

The problem lies with the evaluation on test set with class attributes set to ? or empty. You will get some results on training sets because for training data, you know all the data labels. But for test set where your labels are unknown, how do you know that the classifier prediction y is a correct class for a given instance or simply a misclassification? That is why: you can get the predicted labels of test instances but you cannot have any evaluation.

What follows is merely hypothetical and don't have any relation to your data:

For instance, on training data, you might have something as follows:

=== Error on training data ===

Correctly Classified Instances           4               80      %
Incorrectly Classified Instances         1               20      %
Kappa statistic                          0.6154
Mean absolute error                      0.2429
Root mean squared error                  0.4016
Relative absolute error                 50.0043 %
Root relative squared error             81.8358 %
Total Number of Instances                5     

But for unknown test data, the output might look something as follows:

=== Error on test data ===

Total Number of Instances                0     
Ignored Class Unknown Instances                  5     


=== Confusion Matrix ===

 a b   <-- classified as
 0 0 | a = 1
 0 0 | b = -1

However, you can have the predictions for the unknown data instances as follows:

=== Predictions on test data ===

 inst#     actual  predicted error prediction (feature1,feature2,feature3,feature4)
     1        1:?        1:1       1 (1,7,1,0)
     2        1:?        1:1       1 (1,5,1,0)
     3        1:?       2:-1       0.786 (-1,1,1,0)
     4        1:?       2:-1       0.861 (1,1,1,1)
     5        1:?       2:-1       0.861 (-1,1,1,1)

        === Confusion Matrix ===

         a b   <-- classified as
         2 1 | a = 1
     0 2 | b = -1