Consider the following fictional arff file:
@relation referents
@attribute feature1 NUMERIC
@attribute feature2 NUMERIC
@attribute feature3 NUMERIC
@attribute feature4 NUMERIC
@attribute class{WIN,LOSS}
@data
1, 7, 1, 0, WIN
1, 5, 1, 0, WIN
-1, 1, 1, 0, LOSS
1, 1, 1, 1, WIN
-1, 1, 1, 1, WIN
1, 7, 1, 0, WIN
1, 5, 1, 0, WIN
-1, 1, 1, 0, LOSS
1, 1, 1, 1, WIN
-1, 1, 1, 1, WIN
Using WEKA 3-8, open the above ARFF in Explorer. Click on Classify. Select J48 classifier, keeping all default settings. Under Test Options, select Percentage split = 50% Click More Options, select Output Predictions->CSV
Click start
You will see the following output:
=== Run information ===
Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2
Relation: referents
Instances: 10
Attributes: 5
feature1
feature2
feature3
feature4
class
Test mode: split 50.0% train, remainder test
=== Classifier model (full training set) ===
J48 pruned tree
------------------
feature1 <= -1
| feature4 <= 0: LOSS (2.0)
| feature4 > 0: WIN (2.0)
feature1 > -1: WIN (6.0)
Number of Leaves : 3
Size of the tree : 5
Time taken to build model: 0 seconds
=== Predictions on test split ===
inst#,actual,predicted,error,prediction
1,2:LOSS,1:WIN,+,0.8
2,1:WIN,1:WIN,,0.8
3,1:WIN,1:WIN,,0.8
4,1:WIN,1:WIN,,0.8
5,1:WIN,1:WIN,,0.8
// skipping the rest of the report...
Observe that the last five instances in the input arff file are in the order
WIN WIN LOSS WIN WIN
However, the actual output 'predictions on test split' is in the order: LOSS WIN WIN WIN WIN
Why are these not in the same order, and also, how is one supposed to make the connection between the inst# in 'predictions on test split' and the @data instance in the arff file?