Difference between WEKA instance predictions and confusion matrix results?

Question

I am not new to data mining, so am completely stumped with the WEKA results. Was hoping for some help. Thanks in advance!

I have a data set of numeric vectors that have a binary classification (S,H). I train a NaiveBayes model (although the method really doesn't matter) in leave one out cross-validation. The results are below:

    === Predictions on test data ===
 inst#     actual  predicted error distribution
     1        1:H        1:H       *1,0
     1        1:H        1:H       *1,0
     1        1:H        1:H       *1,0
     1        1:H        1:H       *1,0
     1        1:H        1:H       *1,0
     1        1:H        1:H       *1,0
     1        1:H        1:H       *1,0
     1        1:H        1:H       *1,0
     1        1:H        2:S   +   0,*1
     1        1:H        1:H       *1,0
     1        1:H        1:H       *1,0
     1        1:H        1:H       *1,0
     1        1:H        1:H       *1,0
     1        1:H        1:H       *1,0
     1        1:H        1:H       *0.997,0.003
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1 
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1 
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        1:H   +   *1,0
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        2:S       0,*1
     1        2:S        1:H   +   *1,0

=== Stratified cross-validation ===
=== Summary ===

Total Number of Instances               66

=== Confusion Matrix ===

  a  b   <-- classified as
 14  1 |  a = H
  2 49 |  b = S

As you can see there are three errors in both the output and the confusion matrix. I then re-evaluate the model using an independent data set with the same attributes and same two classes. Here's the result:

=== Re-evaluation on test set ===

User supplied test set
Relation:     FCBC_New.TagProt
Instances:     unknown (yet). Reading incrementally
Attributes:   355

=== Predictions on user test set ===

 inst#     actual  predicted error distribution
     1        1:S        2:H   +   0,*1
     2        1:S        1:S       *1,0
     3        1:S        2:H   +   0,*1
     4        2:H        1:S   +   *1,0
     5        2:H        2:H       0,*1
     6        1:S        2:H   +   0,*1
     7        1:S        2:H   +   0,*1
     8        2:H        2:H       0,*1
     9        1:S        1:S       *1,0
    10        1:S        2:H   +   0,*1
    11        1:S        2:H   +   0,*1
    12        2:H        1:S   +   *1,0
    13        2:H        2:H       0,*1
    14        1:S        2:H   +   0,*1
    15        1:S        2:H   +   0,*1
    16        1:S        2:H   +   0,*1
    17        2:H        2:H       0,*1
    18        2:H        2:H       0,*1
    19        1:S        2:H   +   0,*1
    20        1:S        2:H   +   0,*1
    21        1:S        2:H   +   0,*1
    22        1:S        1:S       *1,0
    23        1:S        2:H   +   0,*1
    24        1:S        2:H   +   0,*1
    25        2:H        1:S   +   *1,0
    26        1:S        2:H   +   0,*1
    27        1:S        1:S       *1,0
    28        1:S        2:H   +   0,*1
    29        1:S        2:H   +   0,*1
    30        1:S        2:H   +   0,*1
    31        1:S        2:H   +   0,*1
    32        1:S        2:H   +   0,*1
    33        1:S        2:H   +   0,*1
    34        1:S        1:S       *1,0
    35        2:H        1:S   +   *1,0
    36        1:S        2:H   +   0,*1
    37        1:S        1:S       *1,0
    38        1:S        1:S       *1,0
    39        2:H        1:S   +   *1,0
    40        1:S        2:H   +   0,*1
    41        1:S        2:H   +   0,*1
    42        1:S        2:H   +   0,*1
    43        1:S        2:H   +   0,*1
    44        1:S        2:H   +   0,*1
    45        1:S        2:H   +   0,*1
    46        1:S        2:H   +   0,*1
    47        2:H        1:S   +   *1,0
    48        1:S        2:H   +   0,*1
    49        2:H        1:S   +   *1,0
    50        2:H        1:S   +   *1,0
    51        1:S        2:H   +   0,*1
    52        1:S        2:H   +   0,*1
    53        2:H        1:S   +   *1,0
    54        1:S        2:H   +   0,*1
    55        1:S        2:H   +   0,*1
    56        1:S        2:H   +   0,*1

=== Summary ===

Correctly Classified Instances          44               78.5714 %
Incorrectly Classified Instances        12               21.4286 %
Kappa statistic                          0.4545
Mean absolute error                      0.2143
Root mean squared error                  0.4629
Coverage of cases (0.95 level)          78.5714 %
Total Number of Instances               56

=== Detailed Accuracy By Class ===

                 TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
                 0.643    0.167    0.563      0.643    0.600      0.456    0.828     0.566     H
                 0.833    0.357    0.875      0.833    0.854      0.456    0.804     0.891     S
Weighted Avg.    0.786    0.310    0.797      0.786    0.790      0.456    0.810     0.810

=== Confusion Matrix ===

  a  b   <-- classified as
  9  5 |  a = H
  7 35 |  b = S

And here is where my problems lie. The output clearly shows that there are many errors. In fact, there are 44. The confusion matrix and the result summary, on the other hand, suggest that there are 12 errors. Now, if the prediction classes were reversed, the confusion matrix would be true. So now, I look at the distribution of scores and I see that in the cross-validation results the value before the comma represents the H class, and the second value is the S class (so the value 1,0 means H prediction). However, in the test results these are reversed and the value 1,0 means S prediction. So, if I take the score distribution, the confusion matrix is right. If I take the prediction (H or S) the confusion matrix is wrong. I tried changing all test file classes to be H or S. This does NOT change the output results or the confusion matrix totals: in the confusion matrix, 16 instances are always predicted a(H) and 40 are always b(S), even though the plain text output is actually 16 b(S) and 40 a(H). Any ideas what is going wrong? It must be a simple thing, but I am completely and totally at a loss...

This is really odd. I think you should try posting this to the Weka mailing-list, maybe this is a bug. Have you tried debugging it in an IDE and look at the internal data structures? That's how I solve most of my Weka problems/misunderstandings. — Sentry
@Sentry, Thanks for the ideas. I've posted to the WEKA list. May be they come through with a solution. I really don't have time to do any major research into this right now, but if I don't get an answer I will follow up soon. Will keep you posted! — YanaB

girip11 girip11 · Accepted Answer · 2014-05-22T15:36:05

It would be better if you could take a look at this weka tutorial on classification of instances http://preciselyconcise.com/apis_and_installations/training_a_weka_classifier_in_java.php Hope it helps. This tutorial also deals with binary classification (positive,negative).

Difference between WEKA instance predictions and confusion matrix results?

1 Answers