0
votes

The reason I want to do this is that I have an additional dataset of numbered files (lets call it 'numberedFiles' dataset) which correspond to the string IDs on the instances, and I want to use the prediction labels as filters to get a subset of numberedFiles dataset. Just to provide more context, I want to segment false positives and false negatives separately (this is binary classification with labels 'yes' and 'no').

Therefore, the theoretical workflow would be this:

1) load .arff file in java, setup FilteredClassifier with RemoveType filter for string ID.

2) Use Evaluation class to perform cross-validation.

3) Somehow access the instances WITH their new prediction labels.

4) Loop over all instances WITH their new prediction label. If prediction=='no' and actual=='yes', write the string ID associated with the sample to a text file for false negatives. If prediction=='yes' and actual=='no', write the string ID associated with the sample to a text file for false positives.

As you can see, the problem is with step 3. So far, I can only use 'eval.predictions()' to get a FastVector of NominalPrediction elements. These elements contain the predicted label, actual label, weight, distribution. BUT they do not contain any information about the features associated with the instance (namely the string ID I need). I know in the weka Explorer you can have it output additional attributes of each instance (the string ID which I need in this case), but I can't figure out how to access this in java.

Btw, here is the code snippet for reference

    Instances trainingData = DataSource.read("my_data_file.arff");
    trainingData.setClassIndex(trainingData.numAttributes()-1);

    // build classifier
    J48 classifier = new J48();

    RemoveType removeID = new RemoveType();
    removeID.setAttributeType(new SelectedTag("Delete string attributes", RemoveType.TAGS_ATTRIBUTETYPE));

    FilteredClassifier meta = new FilteredClassifier();
    meta.setClassifier(classifier);
    meta.setFilter(removeID);

    Evaluation eval = new Evaluation(trainingData);
    Random r = new Random(1);
    // 10-fold
    eval.crossValidateModel(meta, trainingData, 10, r);
    System.out.println(eval.toSummaryString("=== Cross-Validation Summary ===\n", false));
    System.out.println(eval.toClassDetailsString("=== Detailed Accuracy By Class ===\n"));
    System.out.println(eval.toMatrixString("=== Confusion Matrix ===\n"));
    FastVector predictions = eval.predictions();
1

1 Answers

0
votes

Sorry but I am not sure I fully understand your need, especially when you talk about String ID. RemoveType is intended to remove all attributes of a specified type so if you specify "string" you won't get any string after then. From what I understand you'd better use RemoveWithValues.

If you want to access the full dataset attributes/features when predicting you should not use Evaluation but standard weka code for classification. Please look at "Classifying instances" here: http://weka.wikispaces.com/Use+Weka+in+your+Java+code