1
votes

Consider the following fictional arff file:

@relation referents
@attribute feature1      NUMERIC
@attribute feature2      NUMERIC
@attribute feature3      NUMERIC
@attribute feature4      NUMERIC
@attribute class{WIN,LOSS}
@data
1, 7, 1, 0, WIN
1, 5, 1, 0, WIN
-1, 1, 1, 0, LOSS
1, 1, 1, 1, WIN
-1, 1, 1, 1, WIN
1, 7, 1, 0, WIN
1, 5, 1, 0, WIN
-1, 1, 1, 0, LOSS
1, 1, 1, 1, WIN
-1, 1, 1, 1, WIN

Using WEKA 3-8, open the above ARFF in Explorer. Click on Classify. Select J48 classifier, keeping all default settings. Under Test Options, select Percentage split = 50% Click More Options, select Output Predictions->CSV

Click start

You will see the following output:

=== Run information ===

Scheme:       weka.classifiers.trees.J48 -C 0.25 -M 2
Relation:     referents
Instances:    10
Attributes:   5
              feature1
              feature2
              feature3
              feature4
              class
Test mode:    split 50.0% train, remainder test

=== Classifier model (full training set) ===

J48 pruned tree
------------------

feature1 <= -1
|   feature4 <= 0: LOSS (2.0)
|   feature4 > 0: WIN (2.0)
feature1 > -1: WIN (6.0)

Number of Leaves  :     3

Size of the tree :  5


Time taken to build model: 0 seconds

=== Predictions on test split ===

inst#,actual,predicted,error,prediction
1,2:LOSS,1:WIN,+,0.8
2,1:WIN,1:WIN,,0.8
3,1:WIN,1:WIN,,0.8
4,1:WIN,1:WIN,,0.8
5,1:WIN,1:WIN,,0.8

// skipping the rest of the report...

Observe that the last five instances in the input arff file are in the order

WIN WIN LOSS WIN WIN

However, the actual output 'predictions on test split' is in the order: LOSS WIN WIN WIN WIN

Why are these not in the same order, and also, how is one supposed to make the connection between the inst# in 'predictions on test split' and the @data instance in the arff file?

1

1 Answers

0
votes

When weka split your data into train and test, it generates randomly, which means weka randomly selects instances from your arff (you can also specify the applied random). Thats why the order is different from the last 5 instances.