0
votes

Background:

If I open Weka Explorer GUI, train a J48 tree and test using the NSL-KDD training and testing datasets a pruned tree would be produced. Weka Explorer GUI expresses the algorithms reasoning for stating whether something would be classified as an anomaly or not in terms of queries such as src_bytes <= 28.

Screenshot of Weka Explorer GUI showing pruned tree

Question:

Referring to the pruned tree example produced by the Weka Explorer GUI, how can I programmatically have weka express the reasoning for each instance classification in Java?

i.e. Instance A was classified as an anomaly as src_bytes < 28 && dst_host_srv_count < 88 && dst_bytes < 3 etc.

So Far I've been able to:

  • Train and test a J48 tree on the NSL-KDD dataset.

  • Output a description of the J48 tree within Java.

  • Return the J48 tree as an if-then statement.

But I simply have no idea how whilst iterating through each instance during the testing phase, to express the reasoning for each classification; without each time manually outputting the J48 tree as an if-then statement and adding numerous println expressing when each was triggered (which I'd really rather not do, as this would dramatically increase the human intervention requirements in the long-term).

Additional Screenshots:

Screenshot of the 'description of the J48 tree within Java'

Screenshot of the 'J48 tree as an if-then statement'

Code:

    public class Junction_Tree {

    String train_path = "KDDTrain+.arff";
    String test_path = "KDDTest+.arff";
    double accuracy;
    double recall;
    double precision;
    int correctPredictions;
    int incorrectPredictions;
    int numAnomaliesDetected;
    int numNetworkRecords;

    public void run() {
        try {
            Instances train = DataSource.read(train_path);
            Instances test = DataSource.read(test_path);
            train.setClassIndex(train.numAttributes() - 1);
            test.setClassIndex(test.numAttributes() - 1);

            if (!train.equalHeaders(test))
                throw new IllegalArgumentException("datasets are not compatible..");

            Remove rm = new Remove();
            rm.setAttributeIndices("1");

            J48 j48 = new J48();
            j48.setUnpruned(true);

            FilteredClassifier fc = new FilteredClassifier();
            fc.setFilter(rm);
            fc.setClassifier(j48);

            fc.buildClassifier(train);

            numAnomaliesDetected = 0;
            numNetworkRecords = 0;
            int n_ana_p = 0;
            int ana_p = 0;
            correctPredictions = 0;
            incorrectPredictions = 0;

            for (int i = 0; i < test.numInstances(); i++) {
                double pred = fc.classifyInstance(test.instance(i));
                String a = "anomaly";
                String actual;
                String predicted;
                actual = test.classAttribute().value((int) test.instance(i).classValue());
                predicted = test.classAttribute().value((int) pred);

                if (actual.equalsIgnoreCase(a))
                    numAnomaliesDetected++;
                if (actual.equalsIgnoreCase(predicted))
                    correctPredictions++;
                if (!actual.equalsIgnoreCase(predicted))
                    incorrectPredictions++;
                if (actual.equalsIgnoreCase(a) && predicted.equalsIgnoreCase(a))
                    ana_p++;
                if ((!actual.equalsIgnoreCase(a)) && predicted.equalsIgnoreCase(a))
                    n_ana_p++;
                numNetworkRecords++;
            }
            accuracy = (correctPredictions * 100) / (correctPredictions + incorrectPredictions);
            recall = ana_p * 100 / (numAnomaliesDetected);
            precision = ana_p * 100 / (ana_p + n_ana_p);

            System.out.println("\n\naccuracy: " + accuracy + ", Correct Predictions: " + correctPredictions
                    + ", Incorrect Predictions: " + incorrectPredictions);

        writeFile(j48.toSource(J48_if-then.java));

        writeFile(j48.toString());

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) {
        Junction_Tree JT1 = new Junction_Tree();
        JT1.run();
    }

}
1

1 Answers

0
votes

I have never used it myself, but according to the WEKA documentation the J48 class includes a getMembershipValues method. This method should return an array that indicates the node membership of an instance. One of the few mentions of this method appears to be in this thread on the WEKA forums.

Other than this, I can't find any information on possible alternatives other than the one you mentioned.