0
votes

I have a problem with using weka api in java. There are 41 features(or attributes) in my training and testing dataset. I want to take only 25 attributes (eg say 1,3,5,7,8,10.....) and remove other attributes during training and testing the classifier. I have read Weka's Filter manual available at http://weka.wikispaces.com/Use+WEKA+in+your+Java+code#Filter and http://grepcode.com/file/repo1.maven.org/maven2/nz.ac.waikato.cms.weka/weka-stable/3.6.6/weka/filters/unsupervised/attribute/Remove.java but I could not understand how to use filter in my problem. Could you please help me how to write code for this situation. Your suggestions/help will be highly appreciated.

My code is like this....

import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.J48;
import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Remove;
Instances train = ...   
 Instances test = ...   

Here I want to take only 25 attributes(i.e column values) out of 41.
 Classifier cls = new J48();
 cls.buildClassifier(train);
 // evaluate classifier and print some statistics
 Evaluation eval = new Evaluation(train);
 eval.evaluateModel(cls, test);
.....
.....
1

1 Answers

0
votes

Assuming you have this, as you said:

import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.J48;
import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Remove;
Instances train = ...   
Instances test = ...   

Then set up the array of column indices you want. I'm assuming you're doing this in a for loop or something, but I've done just put 6 indices in manually so you get the idea.

int[] indicesOfColumnsToUse = [1,3,5,7,8,10];

Then initialize and set up your removal filter (initialize it, then set the column indices, then invert your selection so that you remove the ones you don't want, then set the "input format" based on your training data)

Remove remove = new Remove();
remove.setAttributeIndices(indicesOfColumnsToUse);
remove.setInvertSelection(true);
remove.setInputFormat(train);

Then apply the removal to your training set

Instances trainingSubset = Filter.useFilter(train, remove);

And then go on as you said, except train the classifier on the subset that you just created:

 Classifier cls = new J48();
 cls.buildClassifier(trainingSubset);
 // evaluate classifier and print some statistics
 Evaluation eval = new Evaluation(train);
 eval.evaluateModel(cls, test);